ISA A64 XML V88a-2021-12 OPT
ISA A64 XML V88a-2021-12 OPT
Copyright © 2010-2021 Arm Limited (or its affiliates). All rights reserved.
DDI 0596 (ID121321)
Arm A64 Instruction Set Architecture
Armv8, for Armv8-A architecture profile
Copyright © 2010-2021 Arm Limited (or its affiliates). All rights reserved.
Release Information
For information on the change history and known issues for this release, see the Release Notes in the A64 ISA XML for
Armv8.8 (2021-12).
Proprietary Notice
This document is protected by copyright and other related rights and the practice or implementation of the information contained
in this document may be protected by one or more patents or pending patent applications. No part of this document may be
reproduced in any form by any means without the express prior written permission of Arm. No license, express or implied, by
estoppel or otherwise to any intellectual property rights is granted by this document unless specifically stated.
Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use
the information for the purposes of determining whether implementations infringe any third party patents.
THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR
PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, Arm makes no representation with respect to,
and has undertaken no analysis to identify or understand the scope and content of, patents, copyrights, trade secrets, or other rights.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR
CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING
OUT OF ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure
of this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof
is not exported, directly or indirectly, in violation of such export laws. Use of the word “partner” in reference to Arm’s customers
is not intended to create or refer to any partnership relationship with any other company. Arm may make changes to this document
at any time and without notice.
This document may be translated into other languages for convenience, and you agree that if there is any conflict between the
English version of this document and any translation, the terms of the English version of the Agreement shall prevail.
The Arm corporate logo and words marked with ™ or © are registered trademarks or trademarks of Arm Limited (or its affiliates)
in the US and/or elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of
their respective owners. You must follow the Arm’s trademark usage guidelines
http://www.arm.com/company/policies/trademarks.
Copyright © 2010-2021 Arm Limited (or its affiliates). All rights reserved.
Confidentiality Status
This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in
accordance with the terms of the agreement entered into by Arm and the party that Arm delivered this document to.
Product Status
This release covers multiple versions of the architecture. The content relating to different versions is given different quality ratings.
The information related to the 2021 Architecture Extensions is at Alpha quality. Alpha quality means that most major features of
the specification are described in the manual, some features and details might be missing.
The information related to the remaining Armv8-A features which was also published in previous releases, is at Beta quality. Beta
quality means that all major features of the specification are described, some details might be missing.
ii Copyright © 2010-2021 Arm Limited (or its affiliates). All rights reserved. DDI 0596
Non-Confidential ID121321
Web Address
http://www.arm.com
Arm values inclusive communities. Arm recognizes that we and our industry have used terms that can be offensive. Arm strives
to lead the industry and create change.
Previous issues of this document included terms that can be offensive. We have replaced these terms. If you find offensive terms
in this document, please contact [email protected].
DDI 0596 Copyright © 2010-2021 Arm Limited (or its affiliates). All rights reserved. iii
ID121321 Non-Confidential
iv Copyright © 2010-2021 Arm Limited (or its affiliates). All rights reserved. DDI 0596
Non-Confidential ID121321
A64 -- Base Instructions (alphabetic order)
AUTIA, AUTIA1716, AUTIASP, AUTIAZ, AUTIZA: Authenticate Instruction address, using key A.
AUTIB, AUTIB1716, AUTIBSP, AUTIBZ, AUTIZB: Authenticate Instruction address, using key B.
B: Branch.
BICS (shifted register): Bitwise Bit Clear (shifted register), setting flags.
Page 2
A64 -- Base Instructions (alphabetic order)
BLRAA, BLRAAZ, BLRAB, BLRABZ: Branch with Link to Register, with pointer authentication.
CAS, CASA, CASAL, CASL: Compare and Swap word or doubleword in memory.
CASP, CASPA, CASPAL, CASPL: Compare and Swap Pair of words or doublewords in memory.
CMN (extended register): Compare Negative (extended register): an alias of ADDS (extended register).
CMN (shifted register): Compare Negative (shifted register): an alias of ADDS (shifted register).
CMP (extended register): Compare (extended register): an alias of SUBS (extended register).
CMP (shifted register): Compare (shifted register): an alias of SUBS (shifted register).
CPYFPN, CPYFMN, CPYFEN: Memory Copy Forward-only, reads and writes non-temporal.
Page 3
A64 -- Base Instructions (alphabetic order)
CPYFPRTN, CPYFMRTN, CPYFERTN: Memory Copy Forward-only, reads unprivileged, reads and writes non-temporal.
CPYFPRTRN, CPYFMRTRN, CPYFERTRN: Memory Copy Forward-only, reads unprivileged and non-temporal.
CPYFPRTWN, CPYFMRTWN, CPYFERTWN: Memory Copy Forward-only, reads unprivileged, writes non-temporal.
CPYFPT, CPYFMT, CPYFET: Memory Copy Forward-only, reads and writes unprivileged.
CPYFPTN, CPYFMTN, CPYFETN: Memory Copy Forward-only, reads and writes unprivileged and non-temporal.
CPYFPTRN, CPYFMTRN, CPYFETRN: Memory Copy Forward-only, reads and writes unprivileged, reads non-temporal.
CPYFPTWN, CPYFMTWN, CPYFETWN: Memory Copy Forward-only, reads and writes unprivileged, writes non-
temporal.
CPYFPWTN, CPYFMWTN, CPYFEWTN: Memory Copy Forward-only, writes unprivileged, reads and writes non-
temporal.
CPYFPWTRN, CPYFMWTRN, CPYFEWTRN: Memory Copy Forward-only, writes unprivileged, reads non-temporal.
CPYFPWTWN, CPYFMWTWN, CPYFEWTWN: Memory Copy Forward-only, writes unprivileged and non-temporal.
CPYPRTN, CPYMRTN, CPYERTN: Memory Copy, reads unprivileged, reads and writes non-temporal.
CPYPTN, CPYMTN, CPYETN: Memory Copy, reads and writes unprivileged and non-temporal.
CPYPTRN, CPYMTRN, CPYETRN: Memory Copy, reads and writes unprivileged, reads non-temporal.
CPYPTWN, CPYMTWN, CPYETWN: Memory Copy, reads and writes unprivileged, writes non-temporal.
CPYPWTN, CPYMWTN, CPYEWTN: Memory Copy, writes unprivileged, reads and writes non-temporal.
Page 4
A64 -- Base Instructions (alphabetic order)
Page 5
A64 -- Base Instructions (alphabetic order)
LDCLR, LDCLRA, LDCLRAL, LDCLRL: Atomic bit clear on word or doubleword in memory.
Page 6
A64 -- Base Instructions (alphabetic order)
LDSET, LDSETA, LDSETAL, LDSETL: Atomic bit set on word or doubleword in memory.
LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL: Atomic signed maximum on word or doubleword in memory.
LDSMIN, LDSMINA, LDSMINAL, LDSMINL: Atomic signed minimum on word or doubleword in memory.
LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL: Atomic unsigned maximum on word or doubleword in memory.
LDUMIN, LDUMINA, LDUMINAL, LDUMINL: Atomic unsigned minimum on word or doubleword in memory.
Page 7
A64 -- Base Instructions (alphabetic order)
MADD: Multiply-Add.
MOV (inverted wide immediate): Move (inverted wide immediate): an alias of MOVN.
MOV (to/from SP): Move between register and stack pointer: an alias of ADD (immediate).
MSUB: Multiply-Subtract.
NEG (shifted register): Negate (shifted register): an alias of SUB (shifted register).
NOP: No Operation.
PACDA, PACDZA: Pointer Authentication Code for Data address, using key A.
PACDB, PACDZB: Pointer Authentication Code for Data address, using key B.
PACIA, PACIA1716, PACIASP, PACIAZ, PACIZA: Pointer Authentication Code for Instruction address, using key A.
PACIB, PACIB1716, PACIBSP, PACIBZ, PACIZB: Pointer Authentication Code for Instruction address, using key B.
Page 8
A64 -- Base Instructions (alphabetic order)
SETGPTN, SETGMTN, SETGETN: Memory Set with tag setting, unprivileged and non-temporal.
Page 9
A64 -- Base Instructions (alphabetic order)
STADD, STADDL: Atomic add on word or doubleword in memory, without return: an alias of LDADD, LDADDA,
LDADDAL, LDADDL.
STADDB, STADDLB: Atomic add on byte in memory, without return: an alias of LDADDB, LDADDAB, LDADDALB,
LDADDLB.
STADDH, STADDLH: Atomic add on halfword in memory, without return: an alias of LDADDH, LDADDAH, LDADDALH,
LDADDLH.
STCLR, STCLRL: Atomic bit clear on word or doubleword in memory, without return: an alias of LDCLR, LDCLRA,
LDCLRAL, LDCLRL.
STCLRB, STCLRLB: Atomic bit clear on byte in memory, without return: an alias of LDCLRB, LDCLRAB, LDCLRALB,
LDCLRLB.
STCLRH, STCLRLH: Atomic bit clear on halfword in memory, without return: an alias of LDCLRH, LDCLRAH,
LDCLRALH, LDCLRLH.
STEOR, STEORL: Atomic exclusive OR on word or doubleword in memory, without return: an alias of LDEOR,
LDEORA, LDEORAL, LDEORL.
STEORB, STEORLB: Atomic exclusive OR on byte in memory, without return: an alias of LDEORB, LDEORAB,
LDEORALB, LDEORLB.
STEORH, STEORLH: Atomic exclusive OR on halfword in memory, without return: an alias of LDEORH, LDEORAH,
LDEORALH, LDEORLH.
Page 10
A64 -- Base Instructions (alphabetic order)
STSET, STSETL: Atomic bit set on word or doubleword in memory, without return: an alias of LDSET, LDSETA,
LDSETAL, LDSETL.
STSETB, STSETLB: Atomic bit set on byte in memory, without return: an alias of LDSETB, LDSETAB, LDSETALB,
LDSETLB.
STSETH, STSETLH: Atomic bit set on halfword in memory, without return: an alias of LDSETH, LDSETAH, LDSETALH,
LDSETLH.
STSMAX, STSMAXL: Atomic signed maximum on word or doubleword in memory, without return: an alias of LDSMAX,
LDSMAXA, LDSMAXAL, LDSMAXL.
STSMAXB, STSMAXLB: Atomic signed maximum on byte in memory, without return: an alias of LDSMAXB,
LDSMAXAB, LDSMAXALB, LDSMAXLB.
STSMAXH, STSMAXLH: Atomic signed maximum on halfword in memory, without return: an alias of LDSMAXH,
LDSMAXAH, LDSMAXALH, LDSMAXLH.
STSMIN, STSMINL: Atomic signed minimum on word or doubleword in memory, without return: an alias of LDSMIN,
LDSMINA, LDSMINAL, LDSMINL.
STSMINB, STSMINLB: Atomic signed minimum on byte in memory, without return: an alias of LDSMINB, LDSMINAB,
LDSMINALB, LDSMINLB.
STSMINH, STSMINLH: Atomic signed minimum on halfword in memory, without return: an alias of LDSMINH,
LDSMINAH, LDSMINALH, LDSMINLH.
STUMAX, STUMAXL: Atomic unsigned maximum on word or doubleword in memory, without return: an alias of
LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL.
STUMAXB, STUMAXLB: Atomic unsigned maximum on byte in memory, without return: an alias of LDUMAXB,
LDUMAXAB, LDUMAXALB, LDUMAXLB.
STUMAXH, STUMAXLH: Atomic unsigned maximum on halfword in memory, without return: an alias of LDUMAXH,
LDUMAXAH, LDUMAXALH, LDUMAXLH.
STUMIN, STUMINL: Atomic unsigned minimum on word or doubleword in memory, without return: an alias of
LDUMIN, LDUMINA, LDUMINAL, LDUMINL.
STUMINB, STUMINLB: Atomic unsigned minimum on byte in memory, without return: an alias of LDUMINB,
LDUMINAB, LDUMINALB, LDUMINLB.
STUMINH, STUMINLH: Atomic unsigned minimum on halfword in memory, without return: an alias of LDUMINH,
LDUMINAH, LDUMINALH, LDUMINLH.
Page 11
A64 -- Base Instructions (alphabetic order)
TST (shifted register): Test (shifted register): an alias of ANDS (shifted register).
Page 12
A64 -- Base Instructions (alphabetic order)
XAFLAG: Convert floating-point condition flags from external format to Arm format.
YIELD: YIELD.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Page 13
ADC
Add with Carry adds two register values and the Carry flag value, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
ADC Page 14
ADCS
Add with Carry, setting flags, adds two register values and the Carry flag value, and writes the result to the destination
register. It updates the condition flags based on the result.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
bits(4) nzcv;
PSTATE.<N,Z,C,V> = nzcv;
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
ADCS Page 15
ADD (extended register)
Add (extended register) adds a register value and a sign or zero-extended register value, followed by an optional left
shift amount, and writes the result to the destination register. The argument that is extended from the <Rm> register
can be a byte, halfword, word, or doubleword.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 0 1 0 1 1 0 0 1 Rm option imm3 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
ExtendType extend_type = DecodeRegExtend(option);
integer shift = UInt(imm3);
if shift > 4 then UNDEFINED;
Assembler Symbols
<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Wn|WSP> Is the 32-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
option <R>
00x W
010 W
x11 X
10x W
110 W
<m> Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in the
"Rm" field.
<extend> For the 32-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
For the 64-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rd" or "Rn" is '11111' (SP) and "option" is '011' then LSL is preferred, but may be omitted when
"imm3" is '000'. In all other cases <extend> is required and must be UXTX when "option" is '011'.
<amount> Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0, encoded in the
"imm3" field. It must be absent when <extend> is absent, is required when <extend> is LSL, and is
optional when <extend> is present but not LSL.
Operation
bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2 = ExtendReg(m, extend_type, shift);
if d == 31 then
SP[] = result;
else
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add (immediate) adds a register value and an optionally-shifted immediate value, and writes the result to the
destination register.
This instruction is used by the alias MOV (to/from SP).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 0 1 0 sh imm12 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(datasize) imm;
case sh of
when '0' imm = ZeroExtend(imm12, datasize);
when '1' imm = ZeroExtend(imm12:Zeros(12), datasize);
Assembler Symbols
<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Wn|WSP> Is the 32-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<imm> Is an unsigned immediate, in the range 0 to 4095, encoded in the "imm12" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #12
Alias Conditions
Operation
bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
if d == 31 then
SP[] = result;
else
X[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add (shifted register) adds a register value and an optionally-shifted register value, and writes the result to the
destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 0 1 0 1 1 shift 0 Rm imm6 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field.
Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
X[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add with Tag adds an immediate value scaled by the Tag granule to the address in the source register, modifies the
Logical Address Tag of the address using an immediate value, and writes the result to the destination register. Tags
specified in GCR_EL1.Exclude are excluded from the possible outputs when modifying the Logical Address Tag.
Integer
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 0 0 0 1 1 0 uimm6 (0) (0) uimm4 Xn Xd
op3
Assembler Symbols
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Xd"
field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Xn" field.
<uimm6> Is an unsigned immediate, a multiple of 16 in the range 0 to 1008, encoded in the "uimm6" field.
<uimm4> Is an unsigned immediate, in the range 0 to 15, encoded in the "uimm4" field.
Operation
if AArch64.AllocationTagAccessIsEnabled(AccType_NORMAL) then
rtag = AArch64.ChooseNonExcludedTag(start_tag, uimm4, exclude);
else
rtag = '0000';
if d == 31 then
SP[] = result;
else
X[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
ADDG Page 22
ADDS (extended register)
Add (extended register), setting flags, adds a register value and a sign or zero-extended register value, followed by an
optional left shift amount, and writes the result to the destination register. The argument that is extended from the
<Rm> register can be a byte, halfword, word, or doubleword. It updates the condition flags based on the result.
This instruction is used by the alias CMN (extended register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 1 0 0 1 Rm option imm3 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
ExtendType extend_type = DecodeRegExtend(option);
integer shift = UInt(imm3);
if shift > 4 then UNDEFINED;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn|WSP> Is the 32-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
option <R>
00x W
010 W
x11 X
10x W
110 W
<m> Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in the
"Rm" field.
<extend> For the 32-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
For the 64-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rn" is '11111' (SP) and "option" is '011' then LSL is preferred, but may be omitted when "imm3" is
'000'. In all other cases <extend> is required and must be UXTX when "option" is '011'.
<amount> Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0, encoded in the
"imm3" field. It must be absent when <extend> is absent, is required when <extend> is LSL, and is
optional when <extend> is present but not LSL.
Alias Conditions
Operation
bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2 = ExtendReg(m, extend_type, shift);
bits(4) nzcv;
PSTATE.<N,Z,C,V> = nzcv;
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add (immediate), setting flags, adds a register value and an optionally-shifted immediate value, and writes the result
to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias CMN (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 0 1 0 sh imm12 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(datasize) imm;
case sh of
when '0' imm = ZeroExtend(imm12, datasize);
when '1' imm = ZeroExtend(imm12:Zeros(12), datasize);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn|WSP> Is the 32-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<imm> Is an unsigned immediate, in the range 0 to 4095, encoded in the "imm12" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #12
Alias Conditions
Operation
bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(4) nzcv;
PSTATE.<N,Z,C,V> = nzcv;
X[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add (shifted register), setting flags, adds a register value and an optionally-shifted register value, and writes the result
to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias CMN (shifted register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 1 shift 0 Rm imm6 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field.
Alias Conditions
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(4) nzcv;
PSTATE.<N,Z,C,V> = nzcv;
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Form PC-relative address adds an immediate value to the PC value to form a PC-relative address, and writes the result
to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 immlo 1 0 0 0 0 immhi Rd
op
integer d = UInt(Rd);
bits(64) imm;
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<label> Is the program label whose address is to be calculated. Its offset from the address of this instruction, in
the range +/-1MB, is encoded in "immhi:immlo".
Operation
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
ADR Page 29
ADRP
Form PC-relative address to 4KB page adds an immediate value that is shifted left by 12 bits, to the PC value to form a
PC-relative address, with the bottom 12 bits masked out, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 immlo 1 0 0 0 0 immhi Rd
op
integer d = UInt(Rd);
bits(64) imm;
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<label> Is the program label whose 4KB page address is to be calculated. Its offset from the page address of
this instruction, in the range +/-4GB, is encoded as "immhi:immlo" times 4096.
Operation
base<11:0> = Zeros(12);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
ADRP Page 30
AND (immediate)
Bitwise AND (immediate) performs a bitwise AND of a register value and an immediate value, and writes the result to
the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 0 0 N immr imms Rn Rd
opc
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(datasize) imm;
if sf == '0' && N != '0' then UNDEFINED;
(imm, -) = DecodeBitMasks(N, imms, immr, TRUE);
Assembler Symbols
<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<imm> For the 32-bit variant: is the bitmask immediate, encoded in "imms:immr".
For the 64-bit variant: is the bitmask immediate, encoded in "N:imms:immr".
Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND (shifted register) performs a bitwise AND of a register value and an optionally-shifted register value, and
writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 0 1 0 1 0 shift 0 Rm imm6 Rn Rd
opc N
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
if sf == '0' && imm6<5> == '1' then UNDEFINED;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field,
Operation
Operational information
If PSTATE.DIT is 1:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND (immediate), setting flags, performs a bitwise AND of a register value and an immediate value, and writes
the result to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias TST (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 1 0 0 N immr imms Rn Rd
opc
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(datasize) imm;
if sf == '0' && N != '0' then UNDEFINED;
(imm, -) = DecodeBitMasks(N, imms, immr, TRUE);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<imm> For the 32-bit variant: is the bitmask immediate, encoded in "imms:immr".
For the 64-bit variant: is the bitmask immediate, encoded in "N:imms:immr".
Alias Conditions
Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND (shifted register), setting flags, performs a bitwise AND of a register value and an optionally-shifted
register value, and writes the result to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias TST (shifted register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 0 shift 0 Rm imm6 Rn Rd
opc N
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field,
Alias Conditions
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Arithmetic Shift Right (immediate) shifts a register value right by an immediate number of bits, shifting in copies of
the sign bit in the upper bits and zeros in the lower bits, and writes the result to the destination register.
• The encodings in this description are named to match the encodings of SBFM.
• The description of SBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 0 N immr x 1 1 1 1 1 Rn Rd
opc imms
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<shift> For the 32-bit variant: is the shift amount, in the range 0 to 31, encoded in the "immr" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, encoded in the "immr" field.
Operation
The description of SBFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Arithmetic Shift Right (register) shifts a register value right by a variable number of bits, shifting in copies of its sign
bit, and writes the result to the destination register. The remainder obtained by dividing the second source register by
the data size defines the number of bits by which the first source register is right-shifted.
• The encodings in this description are named to match the encodings of ASRV.
• The description of ASRV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 1 0 Rn Rd
op2
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in
its bottom 5 bits, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in
its bottom 6 bits, encoded in the "Rm" field.
Operation
The description of ASRV gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Arithmetic Shift Right Variable shifts a register value right by a variable number of bits, shifting in copies of its sign
bit, and writes the result to the destination register. The remainder obtained by dividing the second source register by
the data size defines the number of bits by which the first source register is right-shifted.
This instruction is used by the alias ASR (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 1 0 Rn Rd
op2
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
ShiftType shift_type = DecodeShift(op2);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in
its bottom 5 bits, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in
its bottom 6 bits, encoded in the "Rm" field.
Operation
bits(datasize) result;
bits(datasize) operand2 = X[m];
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
ASRV Page 41
AT
Address Translate. For more information, see op0==0b01, cache maintenance, TLB maintenance, and address
translation instructions.
• The encodings in this description are named to match the encodings of SYS.
• The description of SYS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 op1 0 1 1 1 1 0 0 x op2 Rt
L CRn CRm
AT <at_op>, <Xt>
is equivalent to
Assembler Symbols
<at_op> Is an AT instruction name, as listed for the AT system instruction group, encoded in
“op1:CRm<0>:op2”:
<op1> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op1" field.
<Cm> Is a name 'Cm', with 'm' in the range 0 to 15, encoded in the "CRm" field.
<op2> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op2" field.
<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rt" field.
Operation
The description of SYS gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
AT Page 42
AUTDA, AUTDZA
Authenticate Data address, using key A. This instruction authenticates a data address, using a modifier and key A.
The address is in the general-purpose register that is specified by <Xd>.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for AUTDA.
• The value zero, for AUTDZA.
If the authentication passes, the upper bits of the address are restored to enable subsequent use of the address. If the
authentication fails, the upper bits are corrupted and any subsequent use of the address results in a Translation fault.
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 1 1 0 Rn Rd
AUTDA (Z == 0)
AUTDZA <Xd>
if !HavePACExt() then
UNDEFINED;
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the general-purpose source register or stack pointer, encoded in the "Rn" field.
Operation
if HavePACExt() then
if source_is_sp then
X[d] = AuthDA(X[d], SP[], FALSE);
else
X[d] = AuthDA(X[d], X[n], FALSE);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Authenticate Data address, using key B. This instruction authenticates a data address, using a modifier and key B.
The address is in the general-purpose register that is specified by <Xd>.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for AUTDB.
• The value zero, for AUTDZB.
If the authentication passes, the upper bits of the address are restored to enable subsequent use of the address. If the
authentication fails, the upper bits are corrupted and any subsequent use of the address results in a Translation fault.
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 1 1 1 Rn Rd
AUTDB (Z == 0)
AUTDZB <Xd>
if !HavePACExt() then
UNDEFINED;
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the general-purpose source register or stack pointer, encoded in the "Rn" field.
Operation
if HavePACExt() then
if source_is_sp then
X[d] = AuthDB(X[d], SP[], FALSE);
else
X[d] = AuthDB(X[d], X[n], FALSE);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Authenticate Instruction address, using key A. This instruction authenticates an instruction address, using a modifier
and key A.
The address is:
• In the general-purpose register that is specified by <Xd> for AUTIA and AUTIZA.
• In X17, for AUTIA1716.
• In X30, for AUTIASP and AUTIAZ.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for AUTIA.
• The value zero, for AUTIZA and AUTIAZ.
• In X16, for AUTIA1716.
• In SP, for AUTIASP.
If the authentication passes, the upper bits of the address are restored to enable subsequent use of the address. If the
authentication fails, the upper bits are corrupted and any subsequent use of the address results in a Translation fault.
It has encodings from 2 classes: Integer and System
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 1 0 0 Rn Rd
AUTIA (Z == 0)
AUTIZA <Xd>
if !HavePACExt() then
UNDEFINED;
System
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 x 1 1 0 x 1 1 1 1 1
CRm op2
AUTIA1716
AUTIASP
AUTIAZ
integer d;
integer n;
boolean source_is_sp = FALSE;
case CRm:op2 of
when '0011 100' // AUTIAZ
d = 30;
n = 31;
when '0011 101' // AUTIASP
d = 30;
source_is_sp = TRUE;
when '0001 100' // AUTIA1716
d = 17;
n = 16;
when '0001 000' SEE "PACIA";
when '0001 010' SEE "PACIB";
when '0001 110' SEE "AUTIB";
when '0011 00x' SEE "PACIA";
when '0011 01x' SEE "PACIB";
when '0011 11x' SEE "AUTIB";
when '0000 111' SEE "XPACLRI";
otherwise SEE "HINT";
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the general-purpose source register or stack pointer, encoded in the "Rn" field.
Operation
if HavePACExt() then
if source_is_sp then
X[d] = AuthIA(X[d], SP[], FALSE);
else
X[d] = AuthIA(X[d], X[n], FALSE);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Authenticate Instruction address, using key B. This instruction authenticates an instruction address, using a modifier
and key B.
The address is:
• In the general-purpose register that is specified by <Xd> for AUTIB and AUTIZB.
• In X17, for AUTIB1716.
• In X30, for AUTIBSP and AUTIBZ.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for AUTIB.
• The value zero, for AUTIZB and AUTIBZ.
• In X16, for AUTIB1716.
• In SP, for AUTIBSP.
If the authentication passes, the upper bits of the address are restored to enable subsequent use of the address. If the
authentication fails, the upper bits are corrupted and any subsequent use of the address results in a Translation fault.
It has encodings from 2 classes: Integer and System
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 1 0 1 Rn Rd
AUTIB (Z == 0)
AUTIZB <Xd>
if !HavePACExt() then
UNDEFINED;
System
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 x 1 1 1 x 1 1 1 1 1
CRm op2
AUTIB1716
AUTIBSP
AUTIBZ
integer d;
integer n;
boolean source_is_sp = FALSE;
case CRm:op2 of
when '0011 110' // AUTIBZ
d = 30;
n = 31;
when '0011 111' // AUTIBSP
d = 30;
source_is_sp = TRUE;
when '0001 110' // AUTIB1716
d = 17;
n = 16;
when '0001 000' SEE "PACIA";
when '0001 010' SEE "PACIB";
when '0001 100' SEE "AUTIA";
when '0011 00x' SEE "PACIA";
when '0011 01x' SEE "PACIB";
when '0011 10x' SEE "AUTIA";
when '0000 111' SEE "XPACLRI";
otherwise SEE "HINT";
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the general-purpose source register or stack pointer, encoded in the "Rn" field.
Operation
if HavePACExt() then
if source_is_sp then
X[d] = AuthIB(X[d], SP[], FALSE);
else
X[d] = AuthIB(X[d], X[n], FALSE);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Convert floating-point condition flags from Arm to external format. This instruction converts the state of the
PSTATE.{N,Z,C,V} flags from a form representing the result of an Arm floating-point scalar compare instruction to an
alternative representation required by some software.
System
(FEAT_FlagM2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 (0) (0) (0) (0) 0 1 0 1 1 1 1 1
CRm
AXFLAG
Operation
PSTATE.N = '0';
PSTATE.Z = Z;
PSTATE.C = C;
PSTATE.V = '0';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
AXFLAG Page 49
B
Branch causes an unconditional branch to a label at a PC-relative offset, with a hint that this is not a subroutine call or
return.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 0 1 imm26
op
B <label>
Assembler Symbols
<label> Is the program label to be unconditionally branched to. Its offset from the address of this instruction, in
the range +/-128MB, is encoded as "imm26" times 4.
Operation
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
B Page 50
B.cond
Branch conditionally to a label at a PC-relative offset, with a hint that this is not a subroutine call or return.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 0 1 0 0 imm19 0 cond
B.<cond> <label>
Assembler Symbols
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
<label> Is the program label to be conditionally branched to. Its offset from the address of this instruction, in
the range +/-1MB, is encoded as "imm19" times 4.
Operation
if ConditionHolds(cond) then
BranchTo(PC[] + offset, BranchType_DIR, TRUE);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
B.cond Page 51
BC.cond
Branch Consistent conditionally to a label at a PC-relative offset, with a hint that this branch will behave very
consistently and is very unlikely to change direction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 0 1 0 0 imm19 1 cond
BC.<cond> <label>
Assembler Symbols
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
<label> Is the program label to be conditionally branched to. Its offset from the address of this instruction, in
the range +/-1MB, is encoded as "imm19" times 4.
Operation
if ConditionHolds(cond) then
BranchTo(PC[] + offset, BranchType_DIR, TRUE);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BC.cond Page 52
BFC
Bitfield Clear sets a bitfield of <width> bits at bit position <lsb> of the destination register to zero, leaving the other
destination bits unchanged.
• The encodings in this description are named to match the encodings of BFM.
• The description of BFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 1 0 N immr imms 1 1 1 1 1 Rd
opc Rn
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<lsb> For the 32-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 63.
<width> For the 32-bit variant: is the width of the bitfield, in the range 1 to 32-<lsb>.
For the 64-bit variant: is the width of the bitfield, in the range 1 to 64-<lsb>.
Operation
The description of BFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
BFC Page 53
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BFC Page 54
BFI
Bitfield Insert copies a bitfield of <width> bits from the least significant bits of the source register to bit position
<lsb> of the destination register, leaving the other destination bits unchanged.
• The encodings in this description are named to match the encodings of BFM.
• The description of BFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 1 0 N immr imms != 11111 Rd
opc Rn
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<lsb> For the 32-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 63.
<width> For the 32-bit variant: is the width of the bitfield, in the range 1 to 32-<lsb>.
For the 64-bit variant: is the width of the bitfield, in the range 1 to 64-<lsb>.
Operation
The description of BFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
BFI Page 55
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BFI Page 56
BFM
Bitfield Move is usually accessed via one of its aliases, which are always preferred for disassembly.
If <imms> is greater than or equal to <immr>, this copies a bitfield of (<imms>-<immr>+1) bits starting from bit
position <immr> in the source register to the least significant bits of the destination register.
If <imms> is less than <immr>, this copies a bitfield of (<imms>+1) bits from the least significant bits of the source
register to bit position (regsize-<immr>) of the destination register, where regsize is the destination register size of 32
or 64 bits.
In both cases the other bits of the destination register remain unchanged.
This instruction is used by the aliases BFC, BFI, and BFXIL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 1 0 N immr imms Rn Rd
opc
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
integer R;
bits(datasize) wmask;
bits(datasize) tmask;
R = UInt(immr);
(wmask, tmask) = DecodeBitMasks(N, imms, immr, FALSE);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<immr> For the 32-bit variant: is the right rotate amount, in the range 0 to 31, encoded in the "immr" field.
For the 64-bit variant: is the right rotate amount, in the range 0 to 63, encoded in the "immr" field.
<imms> For the 32-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 31,
encoded in the "imms" field.
For the 64-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 63,
encoded in the "imms" field.
Alias Conditions
BFM Page 57
Operation
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BFM Page 58
BFXIL
Bitfield Extract and Insert Low copies a bitfield of <width> bits starting from bit position <lsb> in the source register
to the least significant bits of the destination register, leaving the other destination bits unchanged.
• The encodings in this description are named to match the encodings of BFM.
• The description of BFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 1 0 N immr imms Rn Rd
opc
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<lsb> For the 32-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 63.
<width> For the 32-bit variant: is the width of the bitfield, in the range 1 to 32-<lsb>.
For the 64-bit variant: is the width of the bitfield, in the range 1 to 64-<lsb>.
Operation
The description of BFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
BFXIL Page 59
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BFXIL Page 60
BIC (shifted register)
Bitwise Bit Clear (shifted register) performs a bitwise AND of a register value and the complement of an optionally-
shifted register value, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 0 1 0 1 0 shift 1 Rm imm6 Rn Rd
opc N
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
if sf == '0' && imm6<5> == '1' then UNDEFINED;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field,
Operation
operand2 = NOT(operand2);
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise Bit Clear (shifted register), setting flags, performs a bitwise AND of a register value and the complement of an
optionally-shifted register value, and writes the result to the destination register. It updates the condition flags based
on the result.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 0 shift 1 Rm imm6 Rn Rd
opc N
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field,
operand2 = NOT(operand2);
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Branch with Link branches to a PC-relative offset, setting the register X30 to PC+4. It provides a hint that this is a
subroutine call.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 0 1 imm26
op
BL <label>
Assembler Symbols
<label> Is the program label to be unconditionally branched to. Its offset from the address of this instruction, in
the range +/-128MB, is encoded as "imm26" times 4.
Operation
X[30] = PC[] + 4;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BL Page 65
BLR
Branch with Link to Register calls a subroutine at an address in a register, setting register X30 to PC+4.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 Rn 0 0 0 0 0
Z op A M Rm
BLR <Xn>
integer n = UInt(Rn);
Assembler Symbols
<Xn> Is the 64-bit name of the general-purpose register holding the address to be branched to, encoded in
the "Rn" field.
Operation
X[30] = PC[] + 4;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BLR Page 66
BLRAA, BLRAAZ, BLRAB, BLRABZ
Branch with Link to Register, with pointer authentication. This instruction authenticates the address in the general-
purpose register that is specified by <Xn>, using a modifier and the specified key, and calls a subroutine at the
authenticated address, setting register X30 to PC+4.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xm|SP> for BLRAA and BLRAB.
• The value zero, for BLRAAZ and BLRABZ.
Key A is used for BLRAA and BLRAAZ, and key B is used for BLRAB and BLRABZ.
If the authentication passes, the PE continues execution at the target of the branch. If the authentication fails, a
Translation fault is generated.
The authenticated address is not written back to the general-purpose register.
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 Z 0 0 1 1 1 1 1 1 0 0 0 0 1 M Rn Rm
op A
BLRAAZ <Xn>
BLRABZ <Xn>
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean use_key_a = (M == '0');
boolean source_is_sp = ((Z == '1') && (m == 31));
if !HavePACExt() then
UNDEFINED;
Assembler Symbols
<Xn> Is the 64-bit name of the general-purpose register holding the address to be branched to, encoded in
the "Rn" field.
<Xm|SP> Is the 64-bit name of the general-purpose source register or stack pointer holding the modifier, encoded
in the "Rm" field.
if use_key_a then
target = AuthIA(target, modifier, TRUE);
else
target = AuthIB(target, modifier, TRUE);
X[30] = PC[] + 4;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Branch to Register branches unconditionally to an address in a register, with a hint that this is not a subroutine return.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 Rn 0 0 0 0 0
Z op A M Rm
BR <Xn>
integer n = UInt(Rn);
Assembler Symbols
<Xn> Is the 64-bit name of the general-purpose register holding the address to be branched to, encoded in
the "Rn" field.
Operation
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BR Page 69
BRAA, BRAAZ, BRAB, BRABZ
Branch to Register, with pointer authentication. This instruction authenticates the address in the general-purpose
register that is specified by <Xn>, using a modifier and the specified key, and branches to the authenticated address.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xm|SP> for BRAA and BRAB.
• The value zero, for BRAAZ and BRABZ.
Key A is used for BRAA and BRAAZ, and key B is used for BRAB and BRABZ.
If the authentication passes, the PE continues execution at the target of the branch. If the authentication fails, a
Translation fault is generated.
The authenticated address is not written back to the general-purpose register.
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 Z 0 0 0 1 1 1 1 1 0 0 0 0 1 M Rn Rm
op A
BRAAZ <Xn>
BRABZ <Xn>
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean use_key_a = (M == '0');
boolean source_is_sp = ((Z == '1') && (m == 31));
if !HavePACExt() then
UNDEFINED;
Assembler Symbols
<Xn> Is the 64-bit name of the general-purpose register holding the address to be branched to, encoded in
the "Rn" field.
<Xm|SP> Is the 64-bit name of the general-purpose source register or stack pointer holding the modifier, encoded
in the "Rm" field.
if use_key_a then
target = AuthIA(target, modifier, TRUE);
else
target = AuthIB(target, modifier, TRUE);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Breakpoint instruction. A BRK instruction generates a Breakpoint Instruction exception. The PE records the exception
in ESR_ELx, using the EC value 0x3c, and captures the value of the immediate argument in ESR_ELx.ISS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 0 0 1 imm16 0 0 0 0 0
BRK #<imm>
if HaveBTIExt() then
SetBTypeCompatible(TRUE);
Assembler Symbols
<imm> Is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.
Operation
AArch64.SoftwareBreakpoint(imm16);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BRK Page 72
BTI
Branch Target Identification. A BTI instruction is used to guard against the execution of instructions which are not the
intended target of a branch.
Outside of a guarded memory region, a BTI instruction executes as a NOP. Within a guarded memory region while
PSTATE.BTYPE != 0b00, a BTI instruction compatible with the current value of PSTATE.BTYPE will not generate a
Branch Target Exception and will allow execution of subsequent instructions within the memory region.
The operand <targets> passed to a BTI instruction determines the values of PSTATE.BTYPE which the BTI instruction
is compatible with.
Note
Within a guarded memory region, when PSTATE.BTYPE != 0b00, all instructions will generate a Branch Target
Exception, other than BRK, BTI, HLT, PACIASP, and PACIBSP, which might not. See the individual instructions for
more information.
System
(FEAT_BTI)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 x x 0 1 1 1 1 1
CRm op2
BTI {<targets>}
SystemHintOp op;
Assembler Symbols
op2<2:1> <targets>
00 (omitted)
01 c
10 j
11 jc
BTI Page 73
Operation
case op of
when SystemHintOp_YIELD
Hint_Yield();
when SystemHintOp_DGH
Hint_DGH();
when SystemHintOp_WFE
Hint_WFE(1, WFxType_WFE);
when SystemHintOp_WFI
Hint_WFI(1, WFxType_WFI);
when SystemHintOp_SEV
SendEvent();
when SystemHintOp_SEVL
SendEventLocal();
when SystemHintOp_ESB
SynchronizeErrors();
AArch64.ESBOperation();
if PSTATE.EL IN {EL0, EL1} && EL2Enabled() then AArch64.vESBOperation();
TakeUnmaskedSErrorInterrupts();
when SystemHintOp_PSB
ProfilingSynchronizationBarrier();
when SystemHintOp_TSB
TraceSynchronizationBarrier();
when SystemHintOp_CSDB
ConsumptionOfSpeculativeDataBarrier();
when SystemHintOp_BTI
SetBTypeNext('00');
otherwise // do nothing
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BTI Page 74
CAS, CASA, CASAL, CASL
Compare and Swap word or doubleword in memory reads a 32-bit word or 64-bit doubleword from memory, and
compares it against the value held in a first register. If the comparison is equal, the value in a second register is
written to memory. If the write is performed, the read and write occur atomically such that no other modification of
the memory location can take place between the read and write.
• CASA and CASAL load from memory with acquire semantics.
• CASL and CASAL store to memory with release semantics.
• CAS has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
The architecture permits that the data read clears any exclusive monitors associated with that location, even if the
compare subsequently fails.
If the instruction generates a synchronous Data Abort, the register which is compared and loaded, that is <Ws>, or
<Xs>, is restored to the value held in the register before the instruction was executed.
No offset
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 L 1 Rs o0 1 1 1 1 1 Rn Rt
size
integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register to be compared and loaded, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be conditionally stored, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register to be compared and loaded, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be conditionally stored, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
bits(64) address;
bits(datasize) comparevalue;
bits(datasize) newvalue;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
comparevalue = X[s];
newvalue = X[t];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare and Swap byte in memory reads an 8-bit byte from memory, and compares it against the value held in a first
register. If the comparison is equal, the value in a second register is written to memory. If the write is performed, the
read and write occur atomically such that no other modification of the memory location can take place between the
read and write.
• CASAB and CASALB load from memory with acquire semantics.
• CASLB and CASALB store to memory with release semantics.
• CASB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
The architecture permits that the data read clears any exclusive monitors associated with that location, even if the
compare subsequently fails.
If the instruction generates a synchronous Data Abort, the register which is compared and loaded, that is <Ws>, is
restored to the values held in the register before the instruction was executed.
No offset
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 L 1 Rs o0 1 1 1 1 1 Rn Rt
size
CASAB (L == 1 && o0 == 0)
CASALB (L == 1 && o0 == 1)
CASB (L == 0 && o0 == 0)
CASLB (L == 0 && o0 == 1)
integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register to be compared and loaded, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be conditionally stored, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
bits(64) address;
bits(8) comparevalue;
bits(8) newvalue;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
comparevalue = X[s];
newvalue = X[t];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare and Swap halfword in memory reads a 16-bit halfword from memory, and compares it against the value held
in a first register. If the comparison is equal, the value in a second register is written to memory. If the write is
performed, the read and write occur atomically such that no other modification of the memory location can take place
between the read and write.
• CASAH and CASALH load from memory with acquire semantics.
• CASLH and CASALH store to memory with release semantics.
• CAS has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
The architecture permits that the data read clears any exclusive monitors associated with that location, even if the
compare subsequently fails.
If the instruction generates a synchronous Data Abort, the register which is compared and loaded, that is <Ws>, is
restored to the values held in the register before the instruction was executed.
No offset
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 L 1 Rs o0 1 1 1 1 1 Rn Rt
size
CASAH (L == 1 && o0 == 0)
CASALH (L == 1 && o0 == 1)
CASH (L == 0 && o0 == 0)
CASLH (L == 0 && o0 == 1)
integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register to be compared and loaded, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be conditionally stored, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
bits(64) address;
bits(16) comparevalue;
bits(16) newvalue;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
comparevalue = X[s];
newvalue = X[t];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare and Swap Pair of words or doublewords in memory reads a pair of 32-bit words or 64-bit doublewords from
memory, and compares them against the values held in the first pair of registers. If the comparison is equal, the values
in the second pair of registers are written to memory. If the writes are performed, the reads and writes occur
atomically such that no other modification of the memory location can take place between the reads and writes.
• CASPA and CASPAL load from memory with acquire semantics.
• CASPL and CASPAL store to memory with release semantics.
• CAS has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
The architecture permits that the data read clears any exclusive monitors associated with that location, even if the
compare subsequently fails.
If the instruction generates a synchronous Data Abort, the registers which are compared and loaded, that is <Ws>
and <W(s+1)>, or <Xs> and <X(s+1)>, are restored to the values held in the registers before the instruction was
executed.
No offset
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 sz 0 0 1 0 0 0 0 L 1 Rs o0 1 1 1 1 1 Rn Rt
Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the first general-purpose register to be compared and loaded, encoded in the "Rs"
field. <Ws> must be an even-numbered register.
<W(s+1)> Is the 32-bit name of the second general-purpose register to be compared and loaded.
<Wt> Is the 32-bit name of the first general-purpose register to be conditionally stored, encoded in the "Rt"
field. <Wt> must be an even-numbered register.
<W(t+1)> Is the 32-bit name of the second general-purpose register to be conditionally stored.
<Xs> Is the 64-bit name of the first general-purpose register to be compared and loaded, encoded in the "Rs"
field. <Xs> must be an even-numbered register.
<X(s+1)> Is the 64-bit name of the second general-purpose register to be compared and loaded.
<Xt> Is the 64-bit name of the first general-purpose register to be conditionally stored, encoded in the "Rt"
field. <Xt> must be an even-numbered register.
Operation
bits(64) address;
bits(2*datasize) comparevalue;
bits(2*datasize) newvalue;
bits(2*datasize) data;
bits(datasize) s1 = X[s];
bits(datasize) s2 = X[s+1];
bits(datasize) t1 = X[t];
bits(datasize) t2 = X[t+1];
comparevalue = if BigEndian(ldacctype) then s1:s2 else s2:s1;
newvalue = if BigEndian(stacctype) then t1:t2 else t2:t1;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if BigEndian(ldacctype) then
X[s] = data<2*datasize-1:datasize>;
X[s+1] = data<datasize-1:0>;
else
X[s] = data<datasize-1:0>;
X[s+1] = data<2*datasize-1:datasize>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare and Branch on Nonzero compares the value in a register with zero, and conditionally branches to a label at a
PC-relative offset if the comparison is not equal. It provides a hint that this is not a subroutine call or return. This
instruction does not affect the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 1 0 1 imm19 Rt
op
32-bit (sf == 0)
64-bit (sf == 1)
integer t = UInt(Rt);
integer datasize = if sf == '1' then 64 else 32;
bits(64) offset = SignExtend(imm19:'00', 64);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be tested, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be tested, encoded in the "Rt" field.
<label> Is the program label to be conditionally branched to. Its offset from the address of this instruction, in
the range +/-1MB, is encoded as "imm19" times 4.
Operation
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CBNZ Page 85
CBZ
Compare and Branch on Zero compares the value in a register with zero, and conditionally branches to a label at a PC-
relative offset if the comparison is equal. It provides a hint that this is not a subroutine call or return. This instruction
does not affect condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 1 0 0 imm19 Rt
op
32-bit (sf == 0)
64-bit (sf == 1)
integer t = UInt(Rt);
integer datasize = if sf == '1' then 64 else 32;
bits(64) offset = SignExtend(imm19:'00', 64);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be tested, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be tested, encoded in the "Rt" field.
<label> Is the program label to be conditionally branched to. Its offset from the address of this instruction, in
the range +/-1MB, is encoded as "imm19" times 4.
Operation
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CBZ Page 86
CCMN (immediate)
Conditional Compare Negative (immediate) sets the value of the condition flags to the result of the comparison of a
register value and a negated immediate value if the condition is TRUE, and an immediate value otherwise.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 1 0 1 0 0 1 0 imm5 cond 1 0 Rn 0 nzcv
op
32-bit (sf == 0)
64-bit (sf == 1)
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(4) flags = nzcv;
bits(datasize) imm = ZeroExtend(imm5, datasize);
Assembler Symbols
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<imm> Is a five bit unsigned (positive) immediate encoded in the "imm5" field.
<nzcv> Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit
NZCV condition flags, encoded in the "nzcv" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
Operation
if ConditionHolds(cond) then
bits(datasize) operand1 = X[n];
(-, flags) = AddWithCarry(operand1, imm, '0');
PSTATE.<N,Z,C,V> = flags;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Conditional Compare Negative (register) sets the value of the condition flags to the result of the comparison of a
register value and the inverse of another register value if the condition is TRUE, and an immediate value otherwise.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 1 0 1 0 0 1 0 Rm cond 0 0 Rn 0 nzcv
op
32-bit (sf == 0)
64-bit (sf == 1)
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
bits(4) flags = nzcv;
Assembler Symbols
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<nzcv> Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit
NZCV condition flags, encoded in the "nzcv" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
Operation
if ConditionHolds(cond) then
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
(-, flags) = AddWithCarry(operand1, operand2, '0');
PSTATE.<N,Z,C,V> = flags;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Conditional Compare (immediate) sets the value of the condition flags to the result of the comparison of a register
value and an immediate value if the condition is TRUE, and an immediate value otherwise.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 1 0 1 0 0 1 0 imm5 cond 1 0 Rn 0 nzcv
op
32-bit (sf == 0)
64-bit (sf == 1)
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(4) flags = nzcv;
bits(datasize) imm = ZeroExtend(imm5, datasize);
Assembler Symbols
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<imm> Is a five bit unsigned (positive) immediate encoded in the "imm5" field.
<nzcv> Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit
NZCV condition flags, encoded in the "nzcv" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
Operation
if ConditionHolds(cond) then
bits(datasize) operand1 = X[n];
bits(datasize) operand2;
operand2 = NOT(imm);
(-, flags) = AddWithCarry(operand1, operand2, '1');
PSTATE.<N,Z,C,V> = flags;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Conditional Compare (register) sets the value of the condition flags to the result of the comparison of two registers if
the condition is TRUE, and an immediate value otherwise.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 1 0 1 0 0 1 0 Rm cond 0 0 Rn 0 nzcv
op
32-bit (sf == 0)
64-bit (sf == 1)
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
bits(4) flags = nzcv;
Assembler Symbols
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<nzcv> Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit
NZCV condition flags, encoded in the "nzcv" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
Operation
if ConditionHolds(cond) then
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
operand2 = NOT(operand2);
(-, flags) = AddWithCarry(operand1, operand2, '1');
PSTATE.<N,Z,C,V> = flags;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Invert Carry Flag. This instruction inverts the value of the PSTATE.C flag.
System
(FEAT_FlagM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 (0) (0) (0) (0) 0 0 0 1 1 1 1 1
CRm
CFINV
Operation
PSTATE.C = NOT(PSTATE.C);
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CFINV Page 91
CFP
Control Flow Prediction Restriction by Context prevents control flow predictions that predict execution addresses
based on information gathered from earlier execution within a particular execution context. Control flow predictions
determined by the actions of code in the target execution context or contexts appearing in program order before the
instruction cannot be used to exploitatively control speculative execution occurring after the instruction is complete
and synchronized.
For more information, see CFP RCTX, Control Flow Prediction Restriction by Context.
• The encodings in this description are named to match the encodings of SYS.
• The description of SYS gives the operational pseudocode for this instruction.
System
(FEAT_SPECRES)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 0 Rt
L op1 CRn CRm op2
is equivalent to
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rt" field.
Operation
The description of SYS gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CFP Page 92
CINC
Conditional Increment returns, in the destination register, the value of the source register incremented by 1 if the
condition is TRUE, and otherwise returns the value of the source register.
• The encodings in this description are named to match the encodings of CSINC.
• The description of CSINC gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 0 0 != 11111 != 111x 0 1 != 11111 Rd
op Rm cond o2 Rn
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<cond> Is one of the standard conditions, excluding AL and NV, encoded in the "cond" field with its least
significant bit inverted.
Operation
The description of CSINC gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CINC Page 93
CINV
Conditional Invert returns, in the destination register, the bitwise inversion of the value of the source register if the
condition is TRUE, and otherwise returns the value of the source register.
• The encodings in this description are named to match the encodings of CSINV.
• The description of CSINV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 != 11111 != 111x 0 0 != 11111 Rd
op Rm cond o2 Rn
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<cond> Is one of the standard conditions, excluding AL and NV, encoded in the "cond" field with its least
significant bit inverted.
Operation
The description of CSINV gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CINV Page 94
CLREX
CLREX {#<imm>}
Assembler Symbols
<imm> Is an optional 4-bit unsigned immediate, in the range 0 to 15, defaulting to 15 and encoded in the
"CRm" field.
Operation
ClearExclusiveLocal(ProcessorID());
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CLREX Page 95
CLS
Count Leading Sign bits counts the number of leading bits of the source register that have the same value as the most
significant bit of the register, and writes the result to the destination register. This count does not include the most
significant bit of the source register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 Rn Rd
op
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
integer result;
bits(datasize) operand1 = X[n];
result = CountLeadingSignBits(operand1);
X[d] = result<datasize-1:0>;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CLS Page 96
CLZ
Count Leading Zeros counts the number of binary zero bits before the first binary one bit in the value of the source
register, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 Rn Rd
op
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
integer result;
bits(datasize) operand1 = X[n];
result = CountLeadingZeroBits(operand1);
X[d] = result<datasize-1:0>;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CLZ Page 97
CMN (extended register)
Compare Negative (extended register) adds a register value and a sign or zero-extended register value, followed by an
optional left shift amount. The argument that is extended from the <Rm> register can be a byte, halfword, word, or
doubleword. It updates the condition flags based on the result, and discards the result.
• The encodings in this description are named to match the encodings of ADDS (extended register).
• The description of ADDS (extended register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 1 0 0 1 Rm option imm3 Rn 1 1 1 1 1
op S Rd
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wn|WSP> Is the 32-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
option <R>
00x W
010 W
x11 X
10x W
110 W
<m> Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in the
"Rm" field.
<extend> For the 32-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
For the 64-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rn" is '11111' (SP) and "option" is '011' then LSL is preferred, but may be omitted when "imm3" is
'000'. In all other cases <extend> is required and must be UXTX when "option" is '011'.
<amount> Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0, encoded in the
"imm3" field. It must be absent when <extend> is absent, is required when <extend> is LSL, and is
optional when <extend> is present but not LSL.
Operation
The description of ADDS (extended register) gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare Negative (immediate) adds a register value and an optionally-shifted immediate value. It updates the
condition flags based on the result, and discards the result.
• The encodings in this description are named to match the encodings of ADDS (immediate).
• The description of ADDS (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 0 1 0 sh imm12 Rn 1 1 1 1 1
op S Rd
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wn|WSP> Is the 32-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<imm> Is an unsigned immediate, in the range 0 to 4095, encoded in the "imm12" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #12
Operation
The description of ADDS (immediate) gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Compare Negative (shifted register) adds a register value and an optionally-shifted register value. It updates the
condition flags based on the result, and discards the result.
• The encodings in this description are named to match the encodings of ADDS (shifted register).
• The description of ADDS (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 1 shift 0 Rm imm6 Rn 1 1 1 1 1
op S Rd
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field.
Operation
The description of ADDS (shifted register) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare (extended register) subtracts a sign or zero-extended register value, followed by an optional left shift
amount, from a register value. The argument that is extended from the <Rm> register can be a byte, halfword, word,
or doubleword. It updates the condition flags based on the result, and discards the result.
• The encodings in this description are named to match the encodings of SUBS (extended register).
• The description of SUBS (extended register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 1 0 0 1 Rm option imm3 Rn 1 1 1 1 1
op S Rd
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wn|WSP> Is the 32-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
option <R>
00x W
010 W
x11 X
10x W
110 W
<m> Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in the
"Rm" field.
<extend> For the 32-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
For the 64-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rn" is '11111' (SP) and "option" is '011' then LSL is preferred, but may be omitted when "imm3" is
'000'. In all other cases <extend> is required and must be UXTX when "option" is '011'.
<amount> Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0, encoded in the
"imm3" field. It must be absent when <extend> is absent, is required when <extend> is LSL, and is
optional when <extend> is present but not LSL.
Operation
The description of SUBS (extended register) gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare (immediate) subtracts an optionally-shifted immediate value from a register value. It updates the condition
flags based on the result, and discards the result.
• The encodings in this description are named to match the encodings of SUBS (immediate).
• The description of SUBS (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 0 1 0 sh imm12 Rn 1 1 1 1 1
op S Rd
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wn|WSP> Is the 32-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<imm> Is an unsigned immediate, in the range 0 to 4095, encoded in the "imm12" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #12
Operation
The description of SUBS (immediate) gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Compare (shifted register) subtracts an optionally-shifted register value from a register value. It updates the condition
flags based on the result, and discards the result.
• The encodings in this description are named to match the encodings of SUBS (shifted register).
• The description of SUBS (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 1 shift 0 Rm imm6 Rn 1 1 1 1 1
op S Rd
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field.
Operation
The description of SUBS (shifted register) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare with Tag subtracts the 56-bit address held in the second source register from the 56-bit address held in the
first source register, updates the condition flags based on the result of the subtraction, and discards the result.
• The encodings in this description are named to match the encodings of SUBPS.
• The description of SUBPS gives the operational pseudocode for this instruction.
Integer
(Armv8.5)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 1 0 1 1 0 Xm 0 0 0 0 0 0 Xn 1 1 1 1 1
Xd
is equivalent to
Assembler Symbols
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Xn"
field.
<Xm|SP> Is the 64-bit name of the second general-purpose source register or stack pointer, encoded in the "Xm"
field.
Operation
The description of SUBPS gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Conditional Negate returns, in the destination register, the negated value of the source register if the condition is
TRUE, and otherwise returns the value of the source register.
• The encodings in this description are named to match the encodings of CSNEG.
• The description of CSNEG gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 Rm != 111x 0 1 Rn Rd
op cond o2
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<cond> Is one of the standard conditions, excluding AL and NV, encoded in the "cond" field with its least
significant bit inverted.
Operation
The description of CSNEG gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Cache Prefetch Prediction Restriction by Context prevents cache allocation predictions that predict execution
addresses based on information gathered from earlier execution within a particular execution context. Cache prefetch
predictions determined by the actions of code in the target execution context or contexts appearing in program order
before the instruction cannot influence speculative execution occurring after the instruction is complete and
synchronized.
For more information, see CPP RCTX, Cache Prefetch Prediction Restriction by Context.
• The encodings in this description are named to match the encodings of SYS.
• The description of SYS gives the operational pseudocode for this instruction.
System
(FEAT_SPECRES)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 Rt
L op1 CRn CRm op2
is equivalent to
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rt" field.
Operation
The description of SYS gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Copy Forward-only. These instructions perform a memory copy. The prologue, main, and epilogue instructions
are expected to be run in succession and to appear consecutively in memory: CPYFP, then CPYFM, and then CPYFE.
CPYFP performs some preconditioning of the arguments suitable for using the CPYFM instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYFM performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYFE performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFP, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFP, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFM, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFM, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFE, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFE, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xs is written back with the lowest address that has not been copied from.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 0 0 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Copy Forward-only, reads and writes non-temporal. These instructions perform a memory copy. The prologue,
main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPN,
then CPYFMN, and then CPYFEN.
CPYFPN performs some preconditioning of the arguments suitable for using the CPYFMN instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYFMN performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYFEN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFEN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFEN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 1 0 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Copy Forward-only, reads non-temporal. These instructions perform a memory copy. The prologue, main, and
epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPRN, then
CPYFMRN, and then CPYFERN.
CPYFPRN performs some preconditioning of the arguments suitable for using the CPYFMRN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMRN performs an IMPLEMENTATION DEFINED amount of
the memory copy. CPYFERN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPRN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPRN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFERN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFERN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
CPYFPRN, CPYFMRN,
Page 123
CPYFERN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 0 0 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYFPRN, CPYFMRN,
Page 124
CPYFERN
Operation
CPYFPRN, CPYFMRN,
Page 125
CPYFERN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
CPYFPRN, CPYFMRN,
Page 126
CPYFERN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYFPRN, CPYFMRN,
Page 127
CPYFERN
CPYFPRT, CPYFMRT, CPYFERT
Memory Copy Forward-only, reads unprivileged. These instructions perform a memory copy. The prologue, main, and
epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPRT, then
CPYFMRT, and then CPYFERT.
CPYFPRT performs some preconditioning of the arguments suitable for using the CPYFMRT instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMRT performs an IMPLEMENTATION DEFINED amount of
the memory copy. CPYFERT performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPRT, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPRT, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMRT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMRT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFERT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFERT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
CPYFPRT, CPYFMRT,
Page 128
CPYFERT
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 0 1 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYFPRT, CPYFMRT,
Page 129
CPYFERT
Operation
CPYFPRT, CPYFMRT,
Page 130
CPYFERT
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
CPYFPRT, CPYFMRT,
Page 131
CPYFERT
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYFPRT, CPYFMRT,
Page 132
CPYFERT
CPYFPRTN, CPYFMRTN, CPYFERTN
Memory Copy Forward-only, reads unprivileged, reads and writes non-temporal. These instructions perform a memory
copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively
in memory: CPYFPRTN, then CPYFMRTN, and then CPYFERTN.
CPYFPRTN performs some preconditioning of the arguments suitable for using the CPYFMRTN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMRTN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFERTN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPRTN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPRTN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMRTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMRTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFERTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFERTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
CPYFPRTN, CPYFMRTN,
Page 133
CPYFERTN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 1 1 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYFPRTN, CPYFMRTN,
Page 134
CPYFERTN
Operation
CPYFPRTN, CPYFMRTN,
Page 135
CPYFERTN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
CPYFPRTN, CPYFMRTN,
Page 136
CPYFERTN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYFPRTN, CPYFMRTN,
Page 137
CPYFERTN
CPYFPRTRN, CPYFMRTRN, CPYFERTRN
Memory Copy Forward-only, reads unprivileged and non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYFPRTRN, then CPYFMRTRN, and then CPYFERTRN.
CPYFPRTRN performs some preconditioning of the arguments suitable for using the CPYFMRTRN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMRTRN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFERTRN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPRTRN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPRTRN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMRTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMRTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFERTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFERTRN option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
CPYFPRTRN, CPYFMRTRN,
Page 138
CPYFERTRN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 0 1 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYFPRTRN, CPYFMRTRN,
Page 139
CPYFERTRN
Operation
CPYFPRTRN, CPYFMRTRN,
Page 140
CPYFERTRN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
CPYFPRTRN, CPYFMRTRN,
Page 141
CPYFERTRN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYFPRTRN, CPYFMRTRN,
Page 142
CPYFERTRN
CPYFPRTWN, CPYFMRTWN, CPYFERTWN
Memory Copy Forward-only, reads unprivileged, writes non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYFPRTWN, then CPYFMRTWN, and then CPYFERTWN.
CPYFPRTWN performs some preconditioning of the arguments suitable for using the CPYFMRTWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMRTWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFERTWN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPRTWN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPRTWN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMRTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMRTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFERTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFERTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
CPYFPRTWN, CPYFMRTWN,
Page 143
CPYFERTWN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 1 1 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYFPRTWN, CPYFMRTWN,
Page 144
CPYFERTWN
Operation
CPYFPRTWN, CPYFMRTWN,
Page 145
CPYFERTWN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
CPYFPRTWN, CPYFMRTWN,
Page 146
CPYFERTWN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYFPRTWN, CPYFMRTWN,
Page 147
CPYFERTWN
CPYFPT, CPYFMT, CPYFET
Memory Copy Forward-only, reads and writes unprivileged. These instructions perform a memory copy. The prologue,
main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPT,
then CPYFMT, and then CPYFET.
CPYFPT performs some preconditioning of the arguments suitable for using the CPYFMT instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYFMT performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYFET performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPT, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPT, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFET, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFET, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 0 1 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Copy Forward-only, reads and writes unprivileged and non-temporal. These instructions perform a memory
copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively
in memory: CPYFPTN, then CPYFMTN, and then CPYFETN.
CPYFPTN performs some preconditioning of the arguments suitable for using the CPYFMTN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMTN performs an IMPLEMENTATION DEFINED amount of
the memory copy. CPYFETN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPTN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPTN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFETN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFETN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
CPYFPTN, CPYFMTN,
Page 153
CPYFETN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 1 1 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYFPTN, CPYFMTN,
Page 154
CPYFETN
Operation
CPYFPTN, CPYFMTN,
Page 155
CPYFETN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
CPYFPTN, CPYFMTN,
Page 156
CPYFETN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYFPTN, CPYFMTN,
Page 157
CPYFETN
CPYFPTRN, CPYFMTRN, CPYFETRN
Memory Copy Forward-only, reads and writes unprivileged, reads non-temporal. These instructions perform a memory
copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively
in memory: CPYFPTRN, then CPYFMTRN, and then CPYFETRN.
CPYFPTRN performs some preconditioning of the arguments suitable for using the CPYFMTRN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMTRN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFETRN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPTRN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPTRN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFETRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFETRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
CPYFPTRN, CPYFMTRN,
Page 158
CPYFETRN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 0 1 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYFPTRN, CPYFMTRN,
Page 159
CPYFETRN
Operation
CPYFPTRN, CPYFMTRN,
Page 160
CPYFETRN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
CPYFPTRN, CPYFMTRN,
Page 161
CPYFETRN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYFPTRN, CPYFMTRN,
Page 162
CPYFETRN
CPYFPTWN, CPYFMTWN, CPYFETWN
Memory Copy Forward-only, reads and writes unprivileged, writes non-temporal. These instructions perform a memory
copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively
in memory: CPYFPTWN, then CPYFMTWN, and then CPYFETWN.
CPYFPTWN performs some preconditioning of the arguments suitable for using the CPYFMTWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMTWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFETWN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPTWN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPTWN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFETWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFETWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
CPYFPTWN, CPYFMTWN,
Page 163
CPYFETWN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 1 1 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYFPTWN, CPYFMTWN,
Page 164
CPYFETWN
Operation
CPYFPTWN, CPYFMTWN,
Page 165
CPYFETWN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
CPYFPTWN, CPYFMTWN,
Page 166
CPYFETWN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYFPTWN, CPYFMTWN,
Page 167
CPYFETWN
CPYFPWN, CPYFMWN, CPYFEWN
Memory Copy Forward-only, writes non-temporal. These instructions perform a memory copy. The prologue, main, and
epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPWN, then
CPYFMWN, and then CPYFEWN.
CPYFPWN performs some preconditioning of the arguments suitable for using the CPYFMWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFEWN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPWN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPWN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFEWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFEWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
CPYFPWN, CPYFMWN,
Page 168
CPYFEWN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 1 0 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYFPWN, CPYFMWN,
Page 169
CPYFEWN
Operation
CPYFPWN, CPYFMWN,
Page 170
CPYFEWN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
CPYFPWN, CPYFMWN,
Page 171
CPYFEWN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYFPWN, CPYFMWN,
Page 172
CPYFEWN
CPYFPWT, CPYFMWT, CPYFEWT
Memory Copy Forward-only, writes unprivileged. These instructions perform a memory copy. The prologue, main, and
epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPWT, then
CPYFMWT, and then CPYFEWT.
CPYFPWT performs some preconditioning of the arguments suitable for using the CPYFMWT instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMWT performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFEWT performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPWT, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPWT, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMWT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMWT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFEWT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFEWT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
CPYFPWT, CPYFMWT,
Page 173
CPYFEWT
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 0 0 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYFPWT, CPYFMWT,
Page 174
CPYFEWT
Operation
CPYFPWT, CPYFMWT,
Page 175
CPYFEWT
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
CPYFPWT, CPYFMWT,
Page 176
CPYFEWT
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYFPWT, CPYFMWT,
Page 177
CPYFEWT
CPYFPWTN, CPYFMWTN, CPYFEWTN
Memory Copy Forward-only, writes unprivileged, reads and writes non-temporal. These instructions perform a memory
copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively
in memory: CPYFPWTN, then CPYFMWTN, and then CPYFEWTN.
CPYFPWTN performs some preconditioning of the arguments suitable for using the CPYFMWTN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMWTN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFEWTN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPWTN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPWTN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMWTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMWTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFEWTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFEWTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
CPYFPWTN, CPYFMWTN,
Page 178
CPYFEWTN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 1 0 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYFPWTN, CPYFMWTN,
Page 179
CPYFEWTN
Operation
CPYFPWTN, CPYFMWTN,
Page 180
CPYFEWTN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
CPYFPWTN, CPYFMWTN,
Page 181
CPYFEWTN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYFPWTN, CPYFMWTN,
Page 182
CPYFEWTN
CPYFPWTRN, CPYFMWTRN, CPYFEWTRN
Memory Copy Forward-only, writes unprivileged, reads non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYFPWTRN, then CPYFMWTRN, and then CPYFEWTRN.
CPYFPWTRN performs some preconditioning of the arguments suitable for using the CPYFMWTRN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMWTRN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFEWTRN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPWTRN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPWTRN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMWTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMWTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFEWTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFEWTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
CPYFPWTRN, CPYFMWTRN,
Page 183
CPYFEWTRN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 0 0 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYFPWTRN, CPYFMWTRN,
Page 184
CPYFEWTRN
Operation
CPYFPWTRN, CPYFMWTRN,
Page 185
CPYFEWTRN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
CPYFPWTRN, CPYFMWTRN,
Page 186
CPYFEWTRN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYFPWTRN, CPYFMWTRN,
Page 187
CPYFEWTRN
CPYFPWTWN, CPYFMWTWN, CPYFEWTWN
Memory Copy Forward-only, writes unprivileged and non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYFPWTWN, then CPYFMWTWN, and then CPYFEWTWN.
CPYFPWTWN performs some preconditioning of the arguments suitable for using the CPYFMWTWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMWTWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFEWTWN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPWTWN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPWTWN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMWTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMWTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFEWTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFEWTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
CPYFPWTWN, CPYFMWTWN,
Page 188
CPYFEWTWN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 1 0 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYFPWTWN, CPYFMWTWN,
Page 189
CPYFEWTWN
Operation
CPYFPWTWN, CPYFMWTWN,
Page 190
CPYFEWTWN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
// Check if the parameters to this instruction are valid for the epilogue.
CPYFPWTWN, CPYFMWTWN,
Page 191
CPYFEWTWN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYFPWTWN, CPYFMWTWN,
Page 192
CPYFEWTWN
CPYP, CPYM, CPYE
Memory Copy. These instructions perform a memory copy. The prologue, main, and epilogue instructions are expected
to be run in succession and to appear consecutively in memory: CPYP, then CPYM, and then CPYE.
CPYP performs some preconditioning of the arguments suitable for using the CPYM instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYM performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYE performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYP, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYP, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYP, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYM, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is copied to -Xn
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYM, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
• If the copy is in the forward direction (PSTATE.N == 0), then:
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 0 0 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Copy, reads and writes non-temporal. These instructions perform a memory copy. The prologue, main, and
epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYPN, then
CPYMN, and then CPYEN.
CPYPN performs some preconditioning of the arguments suitable for using the CPYMN instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYMN performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYEN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 1 0 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Copy, reads non-temporal. These instructions perform a memory copy. The prologue, main, and epilogue
instructions are expected to be run in succession and to appear consecutively in memory: CPYPRN, then CPYMRN, and
then CPYERN.
CPYPRN performs some preconditioning of the arguments suitable for using the CPYMRN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYMRN performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYERN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPRN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPRN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPRN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 0 0 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Copy, reads unprivileged. These instructions perform a memory copy. The prologue, main, and epilogue
instructions are expected to be run in succession and to appear consecutively in memory: CPYPRT, then CPYMRT, and
then CPYERT.
CPYPRT performs some preconditioning of the arguments suitable for using the CPYMRT instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYMRT performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYERT performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPRT, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPRT, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPRT, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMRT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMRT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 0 1 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Copy, reads unprivileged, reads and writes non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYPRTN, then CPYMRTN, and then CPYERTN.
CPYPRTN performs some preconditioning of the arguments suitable for using the CPYMRTN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYMRTN performs an IMPLEMENTATION DEFINED amount of
the memory copy. CPYERTN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPRTN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPRTN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPRTN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMRTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMRTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
CPYPRTN, CPYMRTN,
Page 221
CPYERTN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYERTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYERTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 1 1 0 0 1 Rn Rd
op2
CPYPRTN, CPYMRTN,
Page 222
CPYERTN
Epilogue (op1 == 10)
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYPRTN, CPYMRTN,
Page 223
CPYERTN
Operation
CPYPRTN, CPYMRTN,
Page 224
CPYERTN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
CPYPRTN, CPYMRTN,
Page 225
CPYERTN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
CPYPRTN, CPYMRTN,
Page 226
CPYERTN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYPRTN, CPYMRTN,
Page 227
CPYERTN
CPYPRTRN, CPYMRTRN, CPYERTRN
Memory Copy, reads unprivileged and non-temporal. These instructions perform a memory copy. The prologue, main,
and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYPRTRN,
then CPYMRTRN, and then CPYERTRN.
CPYPRTRN performs some preconditioning of the arguments suitable for using the CPYMRTRN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMRTRN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYERTRN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPRTRN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPRTRN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPRTRN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMRTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMRTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
CPYPRTRN, CPYMRTRN,
Page 228
CPYERTRN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYERTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYERTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 0 1 0 0 1 Rn Rd
op2
CPYPRTRN, CPYMRTRN,
Page 229
CPYERTRN
Epilogue (op1 == 10)
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYPRTRN, CPYMRTRN,
Page 230
CPYERTRN
Operation
CPYPRTRN, CPYMRTRN,
Page 231
CPYERTRN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
CPYPRTRN, CPYMRTRN,
Page 232
CPYERTRN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
CPYPRTRN, CPYMRTRN,
Page 233
CPYERTRN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYPRTRN, CPYMRTRN,
Page 234
CPYERTRN
CPYPRTWN, CPYMRTWN, CPYERTWN
Memory Copy, reads unprivileged, writes non-temporal. These instructions perform a memory copy. The prologue,
main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory:
CPYPRTWN, then CPYMRTWN, and then CPYERTWN.
CPYPRTWN performs some preconditioning of the arguments suitable for using the CPYMRTWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMRTWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYERTWN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPRTWN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPRTWN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPRTWN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMRTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMRTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
CPYPRTWN, CPYMRTWN,
Page 235
CPYERTWN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYERTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYERTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 1 1 0 0 1 Rn Rd
op2
CPYPRTWN, CPYMRTWN,
Page 236
CPYERTWN
Epilogue (op1 == 10)
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYPRTWN, CPYMRTWN,
Page 237
CPYERTWN
Operation
CPYPRTWN, CPYMRTWN,
Page 238
CPYERTWN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
CPYPRTWN, CPYMRTWN,
Page 239
CPYERTWN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
CPYPRTWN, CPYMRTWN,
Page 240
CPYERTWN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYPRTWN, CPYMRTWN,
Page 241
CPYERTWN
CPYPT, CPYMT, CPYET
Memory Copy, reads and writes unprivileged. These instructions perform a memory copy. The prologue, main, and
epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYPT, then
CPYMT, and then CPYET.
CPYPT performs some preconditioning of the arguments suitable for using the CPYMT instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYMT performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYET performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPT, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPT, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPT, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 0 1 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Copy, reads and writes unprivileged and non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYPTN, then CPYMTN, and then CPYETN.
CPYPTN performs some preconditioning of the arguments suitable for using the CPYMTN instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYMTN performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYETN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPTN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPTN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPTN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 1 1 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Copy, reads and writes unprivileged, reads non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYPTRN, then CPYMTRN, and then CPYETRN.
CPYPTRN performs some preconditioning of the arguments suitable for using the CPYMTRN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYMTRN performs an IMPLEMENTATION DEFINED amount of
the memory copy. CPYETRN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPTRN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPTRN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPTRN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
CPYPTRN, CPYMTRN,
Page 256
CPYETRN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYETRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYETRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 0 1 1 0 1 Rn Rd
op2
CPYPTRN, CPYMTRN,
Page 257
CPYETRN
Epilogue (op1 == 10)
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYPTRN, CPYMTRN,
Page 258
CPYETRN
Operation
CPYPTRN, CPYMTRN,
Page 259
CPYETRN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
CPYPTRN, CPYMTRN,
Page 260
CPYETRN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
CPYPTRN, CPYMTRN,
Page 261
CPYETRN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYPTRN, CPYMTRN,
Page 262
CPYETRN
CPYPTWN, CPYMTWN, CPYETWN
Memory Copy, reads and writes unprivileged, writes non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYPTWN, then CPYMTWN, and then CPYETWN.
CPYPTWN performs some preconditioning of the arguments suitable for using the CPYMTWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMTWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYETWN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPTWN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPTWN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPTWN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
CPYPTWN, CPYMTWN,
Page 263
CPYETWN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYETWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYETWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 1 1 1 0 1 Rn Rd
op2
CPYPTWN, CPYMTWN,
Page 264
CPYETWN
Epilogue (op1 == 10)
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYPTWN, CPYMTWN,
Page 265
CPYETWN
Operation
CPYPTWN, CPYMTWN,
Page 266
CPYETWN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
CPYPTWN, CPYMTWN,
Page 267
CPYETWN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
CPYPTWN, CPYMTWN,
Page 268
CPYETWN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYPTWN, CPYMTWN,
Page 269
CPYETWN
CPYPWN, CPYMWN, CPYEWN
Memory Copy, writes non-temporal. These instructions perform a memory copy. The prologue, main, and epilogue
instructions are expected to be run in succession and to appear consecutively in memory: CPYPWN, then CPYMWN,
and then CPYEWN.
CPYPWN performs some preconditioning of the arguments suitable for using the CPYMWN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYMWN performs an IMPLEMENTATION DEFINED amount of
the memory copy. CPYEWN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPWN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPWN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPWN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
CPYPWN, CPYMWN,
Page 270
CPYEWN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYEWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYEWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 1 0 0 0 1 Rn Rd
op2
CPYPWN, CPYMWN,
Page 271
CPYEWN
Epilogue (op1 == 10)
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYPWN, CPYMWN,
Page 272
CPYEWN
Operation
CPYPWN, CPYMWN,
Page 273
CPYEWN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
CPYPWN, CPYMWN,
Page 274
CPYEWN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
CPYPWN, CPYMWN,
Page 275
CPYEWN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYPWN, CPYMWN,
Page 276
CPYEWN
CPYPWT, CPYMWT, CPYEWT
Memory Copy, writes unprivileged. These instructions perform a memory copy. The prologue, main, and epilogue
instructions are expected to be run in succession and to appear consecutively in memory: CPYPWT, then CPYMWT, and
then CPYEWT.
CPYPWT performs some preconditioning of the arguments suitable for using the CPYMWT instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYMWT performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYEWT performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPWT, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPWT, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPWT, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMWT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMWT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 0 0 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Copy, writes unprivileged, reads and writes non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYPWTN, then CPYMWTN, and then CPYEWTN.
CPYPWTN performs some preconditioning of the arguments suitable for using the CPYMWTN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMWTN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYEWTN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPWTN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPWTN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPWTN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMWTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMWTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
CPYPWTN, CPYMWTN,
Page 284
CPYEWTN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYEWTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYEWTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 1 0 1 0 1 Rn Rd
op2
CPYPWTN, CPYMWTN,
Page 285
CPYEWTN
Epilogue (op1 == 10)
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYPWTN, CPYMWTN,
Page 286
CPYEWTN
Operation
CPYPWTN, CPYMWTN,
Page 287
CPYEWTN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
CPYPWTN, CPYMWTN,
Page 288
CPYEWTN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
CPYPWTN, CPYMWTN,
Page 289
CPYEWTN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYPWTN, CPYMWTN,
Page 290
CPYEWTN
CPYPWTRN, CPYMWTRN, CPYEWTRN
Memory Copy, writes unprivileged, reads non-temporal. These instructions perform a memory copy. The prologue,
main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory:
CPYPWTRN, then CPYMWTRN, and then CPYEWTRN.
CPYPWTRN performs some preconditioning of the arguments suitable for using the CPYMWTRN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMWTRN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYEWTRN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPWTRN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPWTRN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPWTRN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMWTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMWTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
CPYPWTRN, CPYMWTRN,
Page 291
CPYEWTRN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYEWTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYEWTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 0 0 1 0 1 Rn Rd
op2
CPYPWTRN, CPYMWTRN,
Page 292
CPYEWTRN
Epilogue (op1 == 10)
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYPWTRN, CPYMWTRN,
Page 293
CPYEWTRN
Operation
CPYPWTRN, CPYMWTRN,
Page 294
CPYEWTRN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
CPYPWTRN, CPYMWTRN,
Page 295
CPYEWTRN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
CPYPWTRN, CPYMWTRN,
Page 296
CPYEWTRN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYPWTRN, CPYMWTRN,
Page 297
CPYEWTRN
CPYPWTWN, CPYMWTWN, CPYEWTWN
Memory Copy, writes unprivileged and non-temporal. These instructions perform a memory copy. The prologue, main,
and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYPWTWN,
then CPYMWTWN, and then CPYEWTWN.
CPYPWTWN performs some preconditioning of the arguments suitable for using the CPYMWTWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMWTWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYEWTWN performs the last part of the memory copy.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPWTWN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPWTWN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPWTWN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMWTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMWTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
CPYPWTWN, CPYMWTWN,
Page 298
CPYEWTWN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYEWTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYEWTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 1 0 1 0 1 Rn Rd
op2
CPYPWTWN, CPYMWTWN,
Page 299
CPYEWTWN
Epilogue (op1 == 10)
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;
MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";
if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.
CPYPWTWN, CPYMWTWN,
Page 300
CPYEWTWN
Operation
CPYPWTWN, CPYMWTWN,
Page 301
CPYEWTWN
CheckMOPSEnabled();
integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);
if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
CPYPWTWN, CPYMWTWN,
Page 302
CPYEWTWN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;
if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
CPYPWTWN, CPYMWTWN,
Page 303
CPYEWTWN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CPYPWTWN, CPYMWTWN,
Page 304
CPYEWTWN
CRC32B, CRC32H, CRC32W, CRC32X
CRC32 checksum performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register.
It takes an input CRC value in the first source operand, performs a CRC on the input value in the second source
operand, and returns the output CRC value. The second source operand can be 8, 16, 32, or 64 bits. To align with
common usage, the bit order of the values is reversed as part of the operation, and the polynomial 0x04C11DB7 is
used for the CRC calculation.
In an Armv8.0 implementation, this is an OPTIONAL instruction. From Armv8.1, it is mandatory for all implementations
to implement this instruction.
Note
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose accumulator output register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose accumulator input register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose data source register, encoded in the "Rm" field.
<Wm> Is the 32-bit name of the general-purpose data source register, encoded in the "Rm" field.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CRC32 checksum performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register.
It takes an input CRC value in the first source operand, performs a CRC on the input value in the second source
operand, and returns the output CRC value. The second source operand can be 8, 16, 32, or 64 bits. To align with
common usage, the bit order of the values is reversed as part of the operation, and the polynomial 0x1EDC6F41 is
used for the CRC calculation.
In an Armv8.0 implementation, this is an OPTIONAL instruction. From Armv8.1, it is mandatory for all implementations
to implement this instruction.
Note
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose accumulator output register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose accumulator input register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose data source register, encoded in the "Rm" field.
<Wm> Is the 32-bit name of the general-purpose data source register, encoded in the "Rm" field.
CRC32CB, CRC32CH,
Page 307
CRC32CW, CRC32CX
Operation
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
CRC32CB, CRC32CH,
Page 308
CRC32CW, CRC32CX
CSDB
Consumption of Speculative Data Barrier is a memory barrier that controls speculative execution and data value
prediction.
No instruction other than branch instructions appearing in program order after the CSDB can be speculatively
executed using the results of any:
• Data value predictions of any instructions.
• PSTATE.{N,Z,C,V} predictions of any instructions other than conditional branch instructions appearing in
program order before the CSDB that have not been architecturally resolved.
• Predictions of SVE predication state for any SVE instructions.
Note
For purposes of the definition of CSDB, PSTATE.{N,Z,C,V} is not considered a data value. This definition permits:
• Control flow speculation before and after the CSDB.
• Speculative execution of conditional data processing instructions after the CSDB, unless they use the
results of data value or PSTATE.{N,Z,C,V} predictions of instructions appearing in program order before
the CSDB that have not been architecturally resolved.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1
CRm op2
CSDB
// Empty.
Operation
ConsumptionOfSpeculativeDataBarrier();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
If the condition is true, Conditional Select writes the value of the first source register to the destination register. If the
condition is false, it writes the value of the second source register to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 0 0 Rm cond 0 0 Rn Rd
op o2
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
Operation
bits(datasize) result;
if ConditionHolds(cond) then
result = X[n];
else
result = X[m];
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Conditional Set sets the destination register to 1 if the condition is TRUE, and otherwise sets it to 0.
• The encodings in this description are named to match the encodings of CSINC.
• The description of CSINC gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 0 0 1 1 1 1 1 != 111x 0 1 1 1 1 1 1 Rd
op Rm cond o2 Rn
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<cond> Is one of the standard conditions, excluding AL and NV, encoded in the "cond" field with its least
significant bit inverted.
Operation
The description of CSINC gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Conditional Set Mask sets all bits of the destination register to 1 if the condition is TRUE, and otherwise sets all bits to
0.
• The encodings in this description are named to match the encodings of CSINV.
• The description of CSINV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 1 1 1 1 1 != 111x 0 0 1 1 1 1 1 Rd
op Rm cond o2 Rn
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<cond> Is one of the standard conditions, excluding AL and NV, encoded in the "cond" field with its least
significant bit inverted.
Operation
The description of CSINV gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Conditional Select Increment returns, in the destination register, the value of the first source register if the condition
is TRUE, and otherwise returns the value of the second source register incremented by 1.
This instruction is used by the aliases CINC, and CSET.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 0 0 Rm cond 0 1 Rn Rd
op o2
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
Alias Conditions
Operation
bits(datasize) result;
if ConditionHolds(cond) then
result = X[n];
else
result = X[m];
result = result + 1;
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Conditional Select Invert returns, in the destination register, the value of the first source register if the condition is
TRUE, and otherwise returns the bitwise inversion value of the second source register.
This instruction is used by the aliases CINV, and CSETM.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 Rm cond 0 0 Rn Rd
op o2
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
Alias Conditions
Operation
bits(datasize) result;
if ConditionHolds(cond) then
result = X[n];
else
result = X[m];
result = NOT(result);
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Conditional Select Negation returns, in the destination register, the value of the first source register if the condition is
TRUE, and otherwise returns the negated value of the second source register.
This instruction is used by the alias CNEG.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 Rm cond 0 1 Rn Rd
op o2
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
Alias Conditions
Operation
bits(datasize) result;
if ConditionHolds(cond) then
result = X[n];
else
result = X[m];
result = NOT(result);
result = result + 1;
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Data Cache operation. For more information, see op0==0b01, cache maintenance, TLB maintenance, and address
translation instructions.
• The encodings in this description are named to match the encodings of SYS.
• The description of SYS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 op1 0 1 1 1 CRm op2 Rt
L CRn
DC <dc_op>, <Xt>
is equivalent to
Assembler Symbols
<dc_op> Is a DC instruction name, as listed for the DC system instruction group, encoded in “op1:CRm:op2”:
<op1> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op1" field.
<Cm> Is a name 'Cm', with 'm' in the range 0 to 15, encoded in the "CRm" field.
<op2> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op2" field.
<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rt" field.
DC Page 319
Operation
The description of SYS gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
DC Page 320
DCPS1
DCPS1 {#<imm>}
Assembler Symbols
<imm> Is an optional 16-bit unsigned immediate, in the range 0 to 65535, defaulting to 0 and encoded in the
"imm16" field.
Operation
DCPSInstruction(LL);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
DCPS2 {#<imm>}
Assembler Symbols
<imm> Is an optional 16-bit unsigned immediate, in the range 0 to 65535, defaulting to 0 and encoded in the
"imm16" field.
Operation
DCPSInstruction(LL);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
DCPS3 {#<imm>}
Assembler Symbols
<imm> Is an optional 16-bit unsigned immediate, in the range 0 to 65535, defaulting to 0 and encoded in the
"imm16" field.
Operation
DCPSInstruction(LL);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Data Gathering Hint is a hint instruction that indicates that it is not expected to be performance optimal to merge
memory accesses with Normal Non-cacheable or Device-GRE attributes appearing in program order before the hint
instruction with any memory accesses appearing after the hint instruction into a single memory transaction on an
interconnect.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0 1 1 1 1 1
CRm op2
DGH
Operation
Hint_DGH();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Data Memory Barrier is a memory barrier that ensures the ordering of observations of memory accesses, see Data
Memory Barrier.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 CRm 1 0 1 1 1 1 1 1
opc
DMB <option>|#<imm>
MBReqDomain domain;
MBReqTypes types;
case CRm<3:2> of
when '00' domain = MBReqDomain_OuterShareable;
when '01' domain = MBReqDomain_Nonshareable;
when '10' domain = MBReqDomain_InnerShareable;
when '11' domain = MBReqDomain_FullSystem;
case CRm<1:0> of
when '00' types = MBReqTypes_All; domain = MBReqDomain_FullSystem;
when '01' types = MBReqTypes_Reads;
when '10' types = MBReqTypes_Writes;
when '11' types = MBReqTypes_All;
Assembler Symbols
ST
Full system is the required shareability domain, writes are the required access type, both before
and after the barrier instruction. Encoded as CRm = 0b1110.
LD
Full system is the required shareability domain, reads are the required access type before the
barrier instruction, and reads and writes are the required access types after the barrier
instruction. Encoded as CRm = 0b1101.
ISH
Inner Shareable is the required shareability domain, reads and writes are the required access
types, both before and after the barrier instruction. Encoded as CRm = 0b1011.
ISHST
Inner Shareable is the required shareability domain, writes are the required access type, both
before and after the barrier instruction. Encoded as CRm = 0b1010.
ISHLD
Inner Shareable is the required shareability domain, reads are the required access type before the
barrier instruction, and reads and writes are the required access types after the barrier
instruction. Encoded as CRm = 0b1001.
NSH
Non-shareable is the required shareability domain, reads and writes are the required access, both
before and after the barrier instruction. Encoded as CRm = 0b0111.
NSHST
Non-shareable is the required shareability domain, writes are the required access type, both
before and after the barrier instruction. Encoded as CRm = 0b0110.
NSHLD
Non-shareable is the required shareability domain, reads are the required access type before the
barrier instruction, and reads and writes are the required access types after the barrier
instruction. Encoded as CRm = 0b0101.
OSHST
Outer Shareable is the required shareability domain, writes are the required access type, both
before and after the barrier instruction. Encoded as CRm = 0b0010.
OSHLD
Outer Shareable is the required shareability domain, reads are the required access type before the
barrier instruction, and reads and writes are the required access types after the barrier
instruction. Encoded as CRm = 0b0001.
All other encodings of CRm that are not listed above are reserved, and can be encoded using the
#<imm> syntax. All unsupported and reserved options must execute as a full system barrier operation,
but software must not rely on this behavior. For more information on whether an access is before or
after a barrier instruction, see Data Memory Barrier (DMB) or see Data Synchronization Barrier (DSB).
<imm> Is a 4-bit unsigned immediate, in the range 0 to 15, encoded in the "CRm" field.
Operation
DataMemoryBarrier(domain, types);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 1 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
DRPS
Operation
DRPSInstruction();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Data Synchronization Barrier is a memory barrier that ensures the completion of memory accesses, see Data
Synchronization Barrier.
A DSB instruction with the nXS qualifier is complete when the subset of these memory accesses with the XS attribute
set to 0 are complete. It does not require that memory accesses with the XS attribute set to 1 are complete.
This instruction is used by the aliases PSSBB, and SSBB.
It has encodings from 2 classes: Memory barrier and Memory nXS barrier
Memory barrier
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 CRm 1 0 0 1 1 1 1 1
opc
DSB <option>|#<imm>
DSBAlias alias;
case CRm of
when '0000' alias = DSBAlias_SSBB;
when '0100' alias = DSBAlias_PSSBB;
otherwise alias = DSBAlias_DSB;
MBReqDomain domain;
case CRm<3:2> of
when '00' domain = MBReqDomain_OuterShareable;
when '01' domain = MBReqDomain_Nonshareable;
when '10' domain = MBReqDomain_InnerShareable;
when '11' domain = MBReqDomain_FullSystem;
MBReqTypes types;
case CRm<1:0> of
when '00' types = MBReqTypes_All; domain = MBReqDomain_FullSystem;
when '01' types = MBReqTypes_Reads;
when '10' types = MBReqTypes_Writes;
when '11' types = MBReqTypes_All;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 imm2 1 0 0 0 1 1 1 1 1 1
DSB <option>nXS|#<imm>
case imm2 of
when '00' domain = MBReqDomain_OuterShareable;
when '01' domain = MBReqDomain_Nonshareable;
when '10' domain = MBReqDomain_InnerShareable;
when '11' domain = MBReqDomain_FullSystem;
Assembler Symbols
<option> For the memory barrier variant: specifies the limitation on the barrier operation. Values are:
ST
Full system is the required shareability domain, writes are the required access type, both before
and after the barrier instruction. Encoded as CRm = 0b1110.
LD
Full system is the required shareability domain, reads are the required access type before the
barrier instruction, and reads and writes are the required access types after the barrier
instruction. Encoded as CRm = 0b1101.
ISH
Inner Shareable is the required shareability domain, reads and writes are the required access
types, both before and after the barrier instruction. Encoded as CRm = 0b1011.
ISHST
Inner Shareable is the required shareability domain, writes are the required access type, both
before and after the barrier instruction. Encoded as CRm = 0b1010.
ISHLD
Inner Shareable is the required shareability domain, reads are the required access type before the
barrier instruction, and reads and writes are the required access types after the barrier
instruction. Encoded as CRm = 0b1001.
NSH
Non-shareable is the required shareability domain, reads and writes are the required access, both
before and after the barrier instruction. Encoded as CRm = 0b0111.
NSHST
Non-shareable is the required shareability domain, writes are the required access type, both
before and after the barrier instruction. Encoded as CRm = 0b0110.
NSHLD
Non-shareable is the required shareability domain, reads are the required access type before the
barrier instruction, and reads and writes are the required access types after the barrier
instruction. Encoded as CRm = 0b0101.
OSH
Outer Shareable is the required shareability domain, reads and writes are the required access
types, both before and after the barrier instruction. Encoded as CRm = 0b0011.
OSHST
Outer Shareable is the required shareability domain, writes are the required access type, both
before and after the barrier instruction. Encoded as CRm = 0b0010.
OSHLD
Outer Shareable is the required shareability domain, reads are the required access type before the
barrier instruction, and reads and writes are the required access types after the barrier
instruction. Encoded as CRm = 0b0001.
All other encodings of CRm, other than the values 0b0000 and 0b0100, that are not listed above are
reserved, and can be encoded using the #<imm> syntax. All unsupported and reserved options must
execute as a full system barrier operation, but software must not rely on this behavior. For more
information on whether an access is before or after a barrier instruction, see Data Memory Barrier
(DMB) or see Data Synchronization Barrier (DSB).
Note
The value 0b0000 is used to encode SSBB and the value 0b0100 is used to encode PSSBB.
For the memory nXS barrier variant: specifies the limitation on the barrier operation. Values are:
SY
Full system is the required shareability domain, reads and writes are the required access types,
both before and after the barrier instruction. This option is referred to as the full system barrier.
Encoded as CRm<3:2> = 0b11.
NSH
Non-shareable is the required shareability domain, reads and writes are the required access, both
before and after the barrier instruction. Encoded as CRm<3:2> = 0b01.
OSH
Outer Shareable is the required shareability domain, reads and writes are the required access
types, both before and after the barrier instruction. Encoded as CRm<3:2> = 0b00.
<imm> For the memory barrier variant: is a 4-bit unsigned immediate, in the range 0 to 15, encoded in the
"CRm" field.
For the memory nXS barrier variant: is a 5-bit unsigned immediate, encoded in “imm2”:
imm2 <imm>
00 16
01 20
10 24
11 28
Alias Conditions
Operation
case alias of
when DSBAlias_SSBB
SpeculativeStoreBypassBarrierToVA();
when DSBAlias_PSSBB
SpeculativeStoreBypassBarrierToPA();
when DSBAlias_DSB
if !nXS && HaveFeatXS() && HaveFeatHCX() then
nXS = PSTATE.EL IN {EL0, EL1} && IsHCRXEL2Enabled() && HCRX_EL2.FnXS == '1';
DataSynchronizationBarrier(domain, types, nXS);
otherwise
Unreachable();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Data Value Prediction Restriction by Context prevents data value predictions that predict execution addresses based
on information gathered from earlier execution within a particular execution context. Data value predictions
determined by the actions of code in the target execution context or contexts appearing in program order before the
instruction cannot be used to exploitatively control speculative execution occurring after the instruction is complete
and synchronized.
For more information, see DVP RCTX, Data Value Prediction Restriction by Context.
• The encodings in this description are named to match the encodings of SYS.
• The description of SYS gives the operational pseudocode for this instruction.
System
(FEAT_SPECRES)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 Rt
L op1 CRn CRm op2
is equivalent to
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rt" field.
Operation
The description of SYS gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise Exclusive OR NOT (shifted register) performs a bitwise Exclusive OR NOT of a register value and an
optionally-shifted register value, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 0 1 0 1 0 shift 1 Rm imm6 Rn Rd
opc N
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
if sf == '0' && imm6<5> == '1' then UNDEFINED;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field,
Operation
operand2 = NOT(operand2);
X[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise Exclusive OR (immediate) performs a bitwise Exclusive OR of a register value and an immediate value, and
writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 0 0 N immr imms Rn Rd
opc
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(datasize) imm;
if sf == '0' && N != '0' then UNDEFINED;
(imm, -) = DecodeBitMasks(N, imms, immr, TRUE);
Assembler Symbols
<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<imm> For the 32-bit variant: is the bitmask immediate, encoded in "imms:immr".
For the 64-bit variant: is the bitmask immediate, encoded in "N:imms:immr".
Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
if d == 31 then
SP[] = result;
else
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise Exclusive OR (shifted register) performs a bitwise Exclusive OR of a register value and an optionally-shifted
register value, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 0 1 0 1 0 shift 0 Rm imm6 Rn Rd
opc N
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
if sf == '0' && imm6<5> == '1' then UNDEFINED;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field,
Operation
X[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Exception Return using the ELR and SPSR for the current Exception level. When executed, the PE restores PSTATE
from the SPSR, and branches to the address held in the ELR.
The PE checks the SPSR for the current Exception level for an illegal return event. See Illegal return events from
AArch64 state.
ERET is UNDEFINED at EL0.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
A M Rn op4
ERET
Operation
AArch64.CheckForERetTrap(FALSE, TRUE);
bits(64) target = ELR[];
AArch64.ExceptionReturn(target, SPSR[]);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Exception Return, with pointer authentication. This instruction authenticates the address in ELR, using SP as the
modifier and the specified key, the PE restores PSTATE from the SPSR for the current Exception level, and branches to
the authenticated address.
Key A is used for ERETAA, and key B is used for ERETAB.
If the authentication passes, the PE continues execution at the target of the branch. If the authentication fails, a
Translation fault is generated.
The authenticated address is not written back to ELR.
The PE checks the SPSR for the current Exception level for an illegal return event. See Illegal return events from
AArch64 state.
ERETAA and ERETAB are UNDEFINED at EL0.
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 M 1 1 1 1 1 1 1 1 1 1
A Rn op4
ERETAA (M == 0)
ERETAA
ERETAB (M == 1)
ERETAB
if !HavePACExt() then
UNDEFINED;
Operation
AArch64.CheckForERetTrap(TRUE, use_key_a);
bits(64) target;
if use_key_a then
target = AuthIA(ELR[], SP[], TRUE);
else
target = AuthIB(ELR[], SP[], TRUE);
AArch64.ExceptionReturn(target, SPSR[]);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Error Synchronization Barrier is an error synchronization event that might also update DISR_EL1 and VDISR_EL2.
This instruction can be used at all Exception levels and in Debug state.
In Debug state, this instruction behaves as if SError interrupts are masked at all Exception levels. See Error
Synchronization Barrier in the Arm(R) Reliability, Availability, and Serviceability (RAS) Specification, Armv8, for
Armv8-A architecture profile.
If the RAS Extension is not implemented, this instruction executes as a NOP.
System
(FEAT_RAS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 1 1 1 1 1
CRm op2
ESB
Operation
SynchronizeErrors();
AArch64.ESBOperation();
if PSTATE.EL IN {EL0, EL1} && EL2Enabled() then AArch64.vESBOperation();
TakeUnmaskedSErrorInterrupts();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
integer lsb;
if N != sf then UNDEFINED;
if sf == '0' && imms<5> == '1' then UNDEFINED;
lsb = UInt(imms);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<lsb> For the 32-bit variant: is the least significant bit position from which to extract, in the range 0 to 31,
encoded in the "imms" field.
For the 64-bit variant: is the least significant bit position from which to extract, in the range 0 to 63,
encoded in the "imms" field.
Alias Conditions
Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
bits(2*datasize) concat = operand1:operand2;
result = concat<lsb+datasize-1:lsb>;
X[d] = result;
Operational information
If PSTATE.DIT is 1:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Tag Mask Insert inserts the tag in the first source register into the excluded set specified in the second source
register, writing the new excluded set to the destination register.
Integer
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 0 1 1 0 Xm 0 0 0 1 0 1 Xn Xd
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Xd" field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Xn"
field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Xm" field.
Operation
mask<UInt(tag)> = '1';
X[d] = mask;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Hint instruction is for the instruction set space that is reserved for architectural hint instructions.
Some encodings described here are not allocated in this revision of the architecture, and behave as NOPs. These
encodings might be allocated to other hint functionality in future revisions of the architecture and therefore must not
be used by software.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 CRm op2 1 1 1 1 1
HINT #<imm>
SystemHintOp op;
case CRm:op2 of
when '0000 000' op = SystemHintOp_NOP;
when '0000 001' op = SystemHintOp_YIELD;
when '0000 010' op = SystemHintOp_WFE;
when '0000 011' op = SystemHintOp_WFI;
when '0000 100' op = SystemHintOp_SEV;
when '0000 101' op = SystemHintOp_SEVL;
when '0000 110'
if !HaveDGHExt() then EndOfInstruction(); // Instruction executes as NOP
op = SystemHintOp_DGH;
when '0000 111' SEE "XPACLRI";
when '0001 xxx'
case op2 of
when '000' SEE "PACIA1716";
when '010' SEE "PACIB1716";
when '100' SEE "AUTIA1716";
when '110' SEE "AUTIB1716";
otherwise EndOfInstruction();
when '0010 000'
if !HaveRASExt() then EndOfInstruction(); // Instruction executes as NOP
op = SystemHintOp_ESB;
when '0010 001'
if !HaveStatisticalProfiling() then EndOfInstruction(); // Instruction executes as NOP
op = SystemHintOp_PSB;
when '0010 010'
if !HaveSelfHostedTrace() then EndOfInstruction(); // Instruction executes as NOP
op = SystemHintOp_TSB;
when '0010 100'
op = SystemHintOp_CSDB;
when '0011 xxx'
case op2 of
when '000' SEE "PACIAZ";
when '001' SEE "PACIASP";
when '010' SEE "PACIBZ";
when '011' SEE "PACIBSP";
when '100' SEE "AUTIAZ";
when '101' SEE "AUTIASP";
when '110' SEE "AUTIBZ";
when '111' SEE "AUTIBSP";
when '0100 xx0'
op = SystemHintOp_BTI;
// Check branch target compatibility between BTI instruction and PSTATE.BTYPE
SetBTypeCompatible(BTypeCompatible_BTI(op2<2:1>));
otherwise EndOfInstruction();
Assembler Symbols
<imm> Is a 7-bit unsigned immediate, in the range 0 to 127 encoded in the "CRm:op2" field.
The encodings that are allocated to architectural hint functionality are described in the "Hints" table in
the "Index by Encoding".
Operation
case op of
when SystemHintOp_YIELD
Hint_Yield();
when SystemHintOp_DGH
Hint_DGH();
when SystemHintOp_WFE
Hint_WFE(1, WFxType_WFE);
when SystemHintOp_WFI
Hint_WFI(1, WFxType_WFI);
when SystemHintOp_SEV
SendEvent();
when SystemHintOp_SEVL
SendEventLocal();
when SystemHintOp_ESB
SynchronizeErrors();
AArch64.ESBOperation();
if PSTATE.EL IN {EL0, EL1} && EL2Enabled() then AArch64.vESBOperation();
TakeUnmaskedSErrorInterrupts();
when SystemHintOp_PSB
ProfilingSynchronizationBarrier();
when SystemHintOp_TSB
TraceSynchronizationBarrier();
when SystemHintOp_CSDB
ConsumptionOfSpeculativeDataBarrier();
when SystemHintOp_BTI
SetBTypeNext('00');
otherwise // do nothing
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Halt instruction. An HLT instruction can generate a Halt Instruction debug event, which causes entry into Debug state.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 0 1 0 imm16 0 0 0 0 0
HLT #<imm>
Assembler Symbols
<imm> Is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.
Operation
Halt(DebugHalt_HaltInstruction);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Hypervisor Call causes an exception to EL2. Software executing at EL1 can use this instruction to call the hypervisor
to request a service.
The HVC instruction is UNDEFINED:
• When EL3 is implemented and SCR_EL3.HCE is set to 0.
• When EL3 is not implemented and HCR_EL2.HCD is set to 1.
• When EL2 is not implemented.
• At EL1 if EL2 is not enabled in the current Security state.
• At EL0.
On executing an HVC instruction, the PE records the exception as a Hypervisor Call exception in ESR_ELx, using the
EC value 0x16, and the value of the immediate argument.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 0 0 0 imm16 0 0 0 1 0
HVC #<imm>
// Empty.
Assembler Symbols
<imm> Is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.
Operation
if !HaveEL(EL2) || PSTATE.EL == EL0 || (PSTATE.EL == EL1 && (!IsSecureEL2Enabled() && IsSecure())) then
UNDEFINED;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Instruction Cache operation. For more information, see op0==0b01, cache maintenance, TLB maintenance, and
address translation instructions.
• The encodings in this description are named to match the encodings of SYS.
• The description of SYS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 op1 0 1 1 1 CRm op2 Rt
L CRn
IC <ic_op>{, <Xt>}
is equivalent to
Assembler Symbols
<ic_op> Is an IC instruction name, as listed for the IC system instruction pages, encoded in “op1:CRm:op2”:
<op1> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op1" field.
<Cm> Is a name 'Cm', with 'm' in the range 0 to 15, encoded in the "CRm" field.
<op2> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op2" field.
<Xt> Is the 64-bit name of the optional general-purpose source register, defaulting to '11111', encoded in the
"Rt" field.
Operation
The description of SYS gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
IC Page 347
IRG
Insert Random Tag inserts a random Logical Address Tag into the address in the first source register, and writes the
result to the destination register. Any tags specified in the optional second source register or in GCR_EL1.Exclude are
excluded from the selection of the random Logical Address Tag.
Integer
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 0 1 1 0 Xm 0 0 0 1 0 0 Xn Xd
Assembler Symbols
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Xd"
field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Xn"
field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Xm" field. Defaults to
XZR if absent.
Operation
if AArch64.AllocationTagAccessIsEnabled(AccType_NORMAL) then
if GCR_EL1.RRND == '1' then
RGSR_EL1 = bits(64) UNKNOWN;
if IsOnes(exclude) then
rtag = '0000';
else
rtag = ChooseRandomNonExcludedTag(exclude);
else
bits(4) start = RGSR_EL1.TAG;
bits(4) offset = AArch64.RandomTag();
RGSR_EL1.TAG = rtag;
else
rtag = '0000';
if d == 31 then
SP[] = result;
else
X[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Instruction Synchronization Barrier flushes the pipeline in the PE and is a context synchronization event. For more
information, see Instruction Synchronization Barrier (ISB).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 CRm 1 1 0 1 1 1 1 1
opc
ISB {<option>|#<imm>}
Assembler Symbols
All other encodings of CRm are reserved. The corresponding instructions execute as full system barrier
operations, but must not be relied upon by software.
<imm> Is an optional 4-bit unsigned immediate, in the range 0 to 15, defaulting to 15 and encoded in the
"CRm" field.
Operation
InstructionSynchronizationBarrier();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Single-copy Atomic 64-byte Load derives an address from a base register value, loads eight 64-bit doublewords from a
memory location, and writes them to consecutive registers, Xt to X(t+7). The data that is loaded is atomic and is
required to be 64-byte aligned.
Integer
(FEAT_LS64)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 0 Rn Rt
integer n = UInt(Rn);
integer t = UInt(Rt);
boolean tag_checked = n != 31;
Assembler Symbols
<Xt> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
CheckLDST64BEnabled();
bits(512) data;
bits(64) address;
bits(64) value;
acctype = AccType_ATOMICLS64;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
for i = 0 to 7
value = data<63+64*i:64*i>;
if BigEndian(acctype) then value = BigEndianReverse(value);
X[t+i] = value;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic add on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory, adds
the value held in a register to it, and stores the result back to memory. The value initially loaded from memory is
returned in the destination register.
• If the destination register is not one of WZR or XZR, LDADDA and LDADDAL load from memory with acquire
semantics.
• LDADDL and LDADDAL store to memory with release semantics.
• LDADD has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STADD, STADDL.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 0 0 0 0 0 Rn Rt
size opc
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
bits(64) address;
bits(datasize) value;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, regsize);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic add on byte in memory atomically loads an 8-bit byte from memory, adds the value held in a register to it, and
stores the result back to memory. The value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDADDAB and LDADDALB load from memory with acquire semantics.
• LDADDLB and LDADDALB store to memory with release semantics.
• LDADDB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STADDB, STADDLB.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 0 0 0 0 0 Rn Rt
size opc
LDADDAB (A == 1 && R == 0)
LDADDALB (A == 1 && R == 1)
LDADDB (A == 0 && R == 0)
LDADDLB (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDADDB, LDADDAB,
Page 354
LDADDALB, LDADDLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDADDB, LDADDAB,
Page 355
LDADDALB, LDADDLB
LDADDH, LDADDAH, LDADDALH, LDADDLH
Atomic add on halfword in memory atomically loads a 16-bit halfword from memory, adds the value held in a register
to it, and stores the result back to memory. The value initially loaded from memory is returned in the destination
register.
• If the destination register is not WZR, LDADDAH and LDADDALH load from memory with acquire semantics.
• LDADDLH and LDADDALH store to memory with release semantics.
• LDADDH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STADDH, STADDLH.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 0 0 0 0 0 Rn Rt
size opc
LDADDAH (A == 1 && R == 0)
LDADDALH (A == 1 && R == 1)
LDADDH (A == 0 && R == 0)
LDADDLH (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDADDH, LDADDAH,
Page 356
LDADDALH, LDADDLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDADDH, LDADDAH,
Page 357
LDADDALH, LDADDLH
LDAPR
Load-Acquire RCpc Register derives an address from a base register value, loads a 32-bit word or 64-bit doubleword
from the derived address in memory, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.
Integer
(FEAT_LRCPC)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 1 0 1 (1) (1) (1) (1) (1) 1 1 0 0 0 0 Rn Rt
size Rs
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire RCpc Register Byte derives an address from a base register value, loads a byte from the derived address
in memory, zero-extends it and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.
Integer
(FEAT_LRCPC)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 0 1 (1) (1) (1) (1) (1) 1 1 0 0 0 0 Rn Rt
size Rs
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire RCpc Register Halfword derives an address from a base register value, loads a halfword from the
derived address in memory, zero-extends it and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.
Integer
(FEAT_LRCPC)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 0 1 (1) (1) (1) (1) (1) 1 1 0 0 0 0 Rn Rt
size Rs
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire RCpc Register (unscaled) calculates an address from a base register and an immediate offset, loads a
32-bit word or 64-bit doubleword from memory, zero-extends it, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.
Unscaled offset
(FEAT_LRCPC2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 1 1 0 0 1 0 1 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer regsize;
bits(64) address;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire RCpc Register Byte (unscaled) calculates an address from a base register and an immediate offset, loads
a byte from memory, zero-extends it, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.
Unscaled offset
(FEAT_LRCPC2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 0 0 1 0 1 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire RCpc Register Halfword (unscaled) calculates an address from a base register and an immediate offset,
loads a halfword from memory, zero-extends it, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.
Unscaled offset
(FEAT_LRCPC2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 0 0 1 0 1 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire RCpc Register Signed Byte (unscaled) calculates an address from a base register and an immediate
offset, loads a signed byte from memory, sign-extends it, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.
Unscaled offset
(FEAT_LRCPC2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 0 0 1 1 x 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop;
boolean signed;
integer regsize;
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
Mem[address, 1, AccType_ORDERED] = data;
when MemOp_LOAD
data = Mem[address, 1, AccType_ORDERED];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);
when MemOp_PREFETCH
Prefetch(address, t<4:0>);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire RCpc Register Signed Halfword (unscaled) calculates an address from a base register and an immediate
offset, loads a signed halfword from memory, sign-extends it, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.
Unscaled offset
(FEAT_LRCPC2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 0 0 1 1 x 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop;
boolean signed;
integer regsize;
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
Mem[address, 2, AccType_ORDERED] = data;
when MemOp_LOAD
data = Mem[address, 2, AccType_ORDERED];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);
when MemOp_PREFETCH
Prefetch(address, t<4:0>);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire RCpc Register Signed Word (unscaled) calculates an address from a base register and an immediate
offset, loads a signed word from memory, sign-extends it, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.
Unscaled offset
(FEAT_LRCPC2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 0 1 1 0 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(32) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire Register derives an address from a base register value, loads a 32-bit word or 64-bit doubleword from
memory, and writes it to a register. The instruction also has memory ordering semantics as described in Load-Acquire,
Store-Release. For information about memory accesses, see Load/Store addressing modes.
Note
For this instruction, if the destination is WZR/XZR, it is impossible for software to observe the presence of the
acquire semantic other than its effect on the arrival at endpoints.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire Register Byte derives an address from a base register value, loads a byte from memory, zero-extends it
and writes it to a register. The instruction also has memory ordering semantics as described in Load-Acquire, Store-
Release. For information about memory accesses, see Load/Store addressing modes.
Note
For this instruction, if the destination is WZR/XZR, it is impossible for software to observe the presence of the
acquire semantic other than its effect on the arrival at endpoints.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire Register Halfword derives an address from a base register value, loads a halfword from memory, zero-
extends it, and writes it to a register. The instruction also has memory ordering semantics as described in Load-
Acquire, Store-Release. For information about memory accesses, see Load/Store addressing modes.
Note
For this instruction, if the destination is WZR/XZR, it is impossible for software to observe the presence of the
acquire semantic other than its effect on the arrival at endpoints.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire Exclusive Pair of Registers derives an address from a base register value, loads two 32-bit words or two
64-bit doublewords from memory, and writes them to two registers. For information on single-copy atomicity and
alignment requirements, see Requirements for single-copy atomicity and Alignment of data accesses. The PE marks
the physical address being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive
instructions. See Synchronization and semaphores. The instruction also has memory ordering semantics, as described
in Load-Acquire, Store-Release. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 sz 0 0 1 0 0 0 0 1 1 (1) (1) (1) (1) (1) 1 Rt2 Rn Rt
L Rs o0
32-bit (sz == 0)
64-bit (sz == 1)
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDAXP.
Assembler Symbols
<Wt1> Is the 32-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Wt2> Is the 32-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
bits(64) address;
bits(datasize) data;
constant integer dbytes = datasize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if rt_unknown then
// ConstrainedUNPREDICTABLE case
X[t] = bits(datasize) UNKNOWN; // In this case t = t2
elsif elsize == 32 then
// 32-bit load exclusive pair (atomic)
data = Mem[address, dbytes, AccType_ORDEREDATOMIC];
if BigEndian(AccType_ORDEREDATOMIC) then
X[t] = data<datasize-1:elsize>;
X[t2] = data<elsize-1:0>;
else
X[t] = data<elsize-1:0>;
X[t2] = data<datasize-1:elsize>;
else // elsize == 64
// 64-bit load exclusive pair (not atomic),
// but must be 128-bit aligned
if address != Align(address, dbytes) then
AArch64.Abort(address, AlignmentFault(AccType_ORDEREDATOMIC, FALSE, FALSE));
X[t] = Mem[address, 8, AccType_ORDEREDATOMIC];
X[t2] = Mem[address+8, 8, AccType_ORDEREDATOMIC];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire Exclusive Register derives an address from a base register value, loads a 32-bit word or 64-bit
doubleword from memory, and writes it to a register. The memory access is atomic. The PE marks the physical address
being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See
Synchronization and semaphores. The instruction also has memory ordering semantics as described in Load-Acquire,
Store-Release. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire Exclusive Register Byte derives an address from a base register value, loads a byte from memory, zero-
extends it and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed
as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See Synchronization and
semaphores. The instruction also has memory ordering semantics as described in Load-Acquire, Store-Release. For
information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load-Acquire Exclusive Register Halfword derives an address from a base register value, loads a halfword from
memory, zero-extends it and writes it to a register. The memory access is atomic. The PE marks the physical address
being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See
Synchronization and semaphores. The instruction also has memory ordering semantics as described in Load-Acquire,
Store-Release. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic bit clear on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory,
performs a bitwise AND with the complement of the value held in a register on it, and stores the result back to
memory. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDCLRA and LDCLRAL load from memory with acquire
semantics.
• LDCLRL and LDCLRAL store to memory with release semantics.
• LDCLR has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STCLR, STCLRL.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 0 0 1 0 0 Rn Rt
size opc
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
bits(64) address;
bits(datasize) value;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, regsize);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic bit clear on byte in memory atomically loads an 8-bit byte from memory, performs a bitwise AND with the
complement of the value held in a register on it, and stores the result back to memory. The value initially loaded from
memory is returned in the destination register.
• If the destination register is not WZR, LDCLRAB and LDCLRALB load from memory with acquire semantics.
• LDCLRLB and LDCLRALB store to memory with release semantics.
• LDCLRB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STCLRB, STCLRLB.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 0 0 1 0 0 Rn Rt
size opc
LDCLRAB (A == 1 && R == 0)
LDCLRALB (A == 1 && R == 1)
LDCLRB (A == 0 && R == 0)
LDCLRLB (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDCLRB, LDCLRAB,
Page 386
LDCLRALB, LDCLRLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDCLRB, LDCLRAB,
Page 387
LDCLRALB, LDCLRLB
LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH
Atomic bit clear on halfword in memory atomically loads a 16-bit halfword from memory, performs a bitwise AND with
the complement of the value held in a register on it, and stores the result back to memory. The value initially loaded
from memory is returned in the destination register.
• If the destination register is not WZR, LDCLRAH and LDCLRALH load from memory with acquire semantics.
• LDCLRLH and LDCLRALH store to memory with release semantics.
• LDCLRH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STCLRH, STCLRLH.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 0 0 1 0 0 Rn Rt
size opc
LDCLRAH (A == 1 && R == 0)
LDCLRALH (A == 1 && R == 1)
LDCLRH (A == 0 && R == 0)
LDCLRLH (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDCLRH, LDCLRAH,
Page 388
LDCLRALH, LDCLRLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDCLRH, LDCLRAH,
Page 389
LDCLRALH, LDCLRLH
LDEOR, LDEORA, LDEORAL, LDEORL
Atomic exclusive OR on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from
memory, performs an exclusive OR with the value held in a register on it, and stores the result back to memory. The
value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDEORA and LDEORAL load from memory with acquire
semantics.
• LDEORL and LDEORAL store to memory with release semantics.
• LDEOR has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STEOR, STEORL.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 0 1 0 0 0 Rn Rt
size opc
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
bits(64) address;
bits(datasize) value;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, regsize);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic exclusive OR on byte in memory atomically loads an 8-bit byte from memory, performs an exclusive OR with
the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is
returned in the destination register.
• If the destination register is not WZR, LDEORAB and LDEORALB load from memory with acquire semantics.
• LDEORLB and LDEORALB store to memory with release semantics.
• LDEORB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STEORB, STEORLB.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 0 1 0 0 0 Rn Rt
size opc
LDEORAB (A == 1 && R == 0)
LDEORALB (A == 1 && R == 1)
LDEORB (A == 0 && R == 0)
LDEORLB (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDEORB, LDEORAB,
Page 393
LDEORALB, LDEORLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDEORB, LDEORAB,
Page 394
LDEORALB, LDEORLB
LDEORH, LDEORAH, LDEORALH, LDEORLH
Atomic exclusive OR on halfword in memory atomically loads a 16-bit halfword from memory, performs an exclusive
OR with the value held in a register on it, and stores the result back to memory. The value initially loaded from
memory is returned in the destination register.
• If the destination register is not WZR, LDEORAH and LDEORALH load from memory with acquire semantics.
• LDEORLH and LDEORALH store to memory with release semantics.
• LDEORH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STEORH, STEORLH.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 0 1 0 0 0 Rn Rt
size opc
LDEORAH (A == 1 && R == 0)
LDEORALH (A == 1 && R == 1)
LDEORH (A == 0 && R == 0)
LDEORLH (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDEORH, LDEORAH,
Page 395
LDEORALH, LDEORLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDEORH, LDEORAH,
Page 396
LDEORALH, LDEORLH
LDG
Load Allocation Tag loads an Allocation Tag from a memory address, generates a Logical Address Tag from the
Allocation Tag and merges it into the destination register. The address used for the load is calculated from the base
register and an immediate signed offset scaled by the Tag granule.
Integer
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 1 1 imm9 0 0 Xn Xt
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Xt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Xn" field.
<simm> Is the optional signed immediate offset, a multiple of 16 in the range -4096 to 4080, defaulting to 0 and
encoded in the "imm9" field.
Operation
bits(64) address;
bits(4) tag;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Tag Multiple reads a naturally aligned block of N Allocation Tags, where the size of N is identified in
GMID_EL1.BS, and writes the Allocation Tag read from address A to the destination register at
4*A<7:4>+3:4*A<7:4>. Bits of the destination register not written with an Allocation Tag are set to 0.
This instruction is UNDEFINED at EL0.
This instruction generates an Unchecked access.
Integer
(FEAT_MTE2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 Xn Xt
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Xt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Xn" field.
Operation
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
for i = 0 to count-1
bits(4) tag = AArch64.MemTag[address, AccType_NORMAL];
data<(index*4)+3:index*4> = tag;
address = address + TAG_GRANULE;
index = index + 1;
X[t] = data;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load LOAcquire Register loads a 32-bit word or 64-bit doubleword from memory, and writes it to a register. The
instruction also has memory ordering semantics as described in Load LOAcquire, Store LORelease. For information
about memory accesses, see Load/Store addressing modes.
Note
For this instruction, if the destination is WZR/XZR, it is impossible for software to observe the presence of the
acquire semantic other than its effect on the arrival at endpoints.
No offset
(FEAT_LOR)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load LOAcquire Register Byte loads a byte from memory, zero-extends it and writes it to a register. The instruction
also has memory ordering semantics as described in Load LOAcquire, Store LORelease. For information about memory
accesses, see Load/Store addressing modes.
Note
For this instruction, if the destination is WZR/XZR, it is impossible for software to observe the presence of the
acquire semantic other than its effect on the arrival at endpoints.
No offset
(FEAT_LOR)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load LOAcquire Register Halfword loads a halfword from memory, zero-extends it, and writes it to a register. The
instruction also has memory ordering semantics as described in Load LOAcquire, Store LORelease. For information
about memory accesses, see Load/Store addressing modes.
Note
For this instruction, if the destination is WZR/XZR, it is impossible for software to observe the presence of the
acquire semantic other than its effect on the arrival at endpoints.
No offset
(FEAT_LOR)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Pair of Registers, with non-temporal hint, calculates an address from a base register value and an immediate
offset, loads two 32-bit words or two 64-bit doublewords from memory, and writes them to two registers.
For information about memory accesses, see Load/Store addressing modes. For information about Non-temporal pair
instructions, see Load/Store Non-temporal pair.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 0 0 1 imm7 Rt2 Rn Rt
opc L
// Empty.
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDNP.
Assembler Symbols
<Wt1> Is the 32-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Wt2> Is the 32-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the 32-bit variant: is the optional signed immediate byte offset, a multiple of 4 in the range -256 to
252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit variant: is the optional signed immediate byte offset, a multiple of 8 in the range -512 to
504, defaulting to 0 and encoded in the "imm7" field as <imm>/8.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if opc<0> == '1' then UNDEFINED;
integer scale = 2 + UInt(opc<1>);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = n != 31;
if t == t2 then
Constraint c = ConstrainUnpredictable(Unpredictable_LDPOVERLAP);
assert c IN {Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_UNKNOWN rt_unknown = TRUE; // result is UNKNOWN
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if HaveLSE2Ext() then
bits(2*datasize) full_data;
full_data = Mem[address, 2*dbytes, AccType_NORMAL, TRUE];
if BigEndian(AccType_STREAM) then
data2 = full_data<(datasize-1):0>;
data1 = full_data<(2*datasize-1):datasize>;
else
data1 = full_data<(datasize-1):0>;
data2 = full_data<(2*datasize-1):datasize>;
else
data1 = Mem[address, dbytes, AccType_STREAM];
data2 = Mem[address+dbytes, dbytes, AccType_STREAM];
if rt_unknown then
data1 = bits(datasize) UNKNOWN;
data2 = bits(datasize) UNKNOWN;
X[t] = data1;
X[t2] = data2;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Pair of Registers calculates an address from a base register value and an immediate offset, loads two 32-bit
words or two 64-bit doublewords from memory, and writes them to two registers. For information about memory
accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 0 1 1 imm7 Rt2 Rn Rt
opc L
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 1 1 1 imm7 Rt2 Rn Rt
opc L
Signed offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 1 0 1 imm7 Rt2 Rn Rt
opc L
Assembler Symbols
<Wt1> Is the 32-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Wt2> Is the 32-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the 32-bit post-index and 32-bit pre-index variant: is the signed immediate byte offset, a multiple of
4 in the range -256 to 252, encoded in the "imm7" field as <imm>/4.
For the 32-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 4 in the
range -256 to 252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit post-index and 64-bit pre-index variant: is the signed immediate byte offset, a multiple of
8 in the range -512 to 504, encoded in the "imm7" field as <imm>/8.
For the 64-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 8 in the
range -512 to 504, defaulting to 0 and encoded in the "imm7" field as <imm>/8.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if L:opc<0> == '01' || opc == '11' then UNDEFINED;
boolean signed = (opc<0> != '0');
integer scale = 2 + UInt(opc<1>);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = wback || n != 31;
if t == t2 then
Constraint c = ConstrainUnpredictable(Unpredictable_LDPOVERLAP);
assert c IN {Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_UNKNOWN rt_unknown = TRUE; // result is UNKNOWN
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Pair of Registers Signed Word calculates an address from a base register value and an immediate offset, loads
two 32-bit words from memory, sign-extends them, and writes them to two registers. For information about memory
accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 0 1 1 imm7 Rt2 Rn Rt
opc L
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 1 1 1 imm7 Rt2 Rn Rt
opc L
Signed offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 1 0 1 imm7 Rt2 Rn Rt
opc L
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDPSW.
Assembler Symbols
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the post-index and pre-index variant: is the signed immediate byte offset, a multiple of 4 in the
range -256 to 252, encoded in the "imm7" field as <imm>/4.
For the signed offset variant: is the optional signed immediate byte offset, a multiple of 4 in the range
-256 to 252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
bits(64) offset = LSL(SignExtend(imm7, 64), 2);
boolean tag_checked = wback || n != 31;
if t == t2 then
Constraint c = ConstrainUnpredictable(Unpredictable_LDPOVERLAP);
assert c IN {Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_UNKNOWN rt_unknown = TRUE; // result is UNKNOWN
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();
bits(64) address;
bits(32) data1;
bits(32) data2;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register (immediate) loads a word or doubleword from memory and writes it to a register. The address that is
used for the load is calculated from a base register and an immediate offset. For information about memory accesses,
see Load/Store addressing modes. The Unsigned offset variant scales the immediate offset value by the size of the
value accessed before adding it to the base register value.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 0 imm9 0 1 Rn Rt
size opc
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 0 imm9 1 1 Rn Rt
size opc
Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 1 0 1 imm12 Rn Rt
size opc
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDR (immediate).
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> For the 32-bit variant: is the optional positive immediate byte offset, a multiple of 4 in the range 0 to
16380, defaulting to 0 and encoded in the "imm12" field as <pimm>/4.
For the 64-bit variant: is the optional positive immediate byte offset, a multiple of 8 in the range 0 to
32760, defaulting to 0 and encoded in the "imm12" field as <pimm>/8.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer regsize;
bits(64) address;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register (literal) calculates an address from the PC value and an immediate offset, loads a word from memory,
and writes it to a register. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 x 0 1 1 0 0 0 imm19 Rt
opc
integer t = UInt(Rt);
MemOp memop = MemOp_LOAD;
boolean signed = FALSE;
integer size;
bits(64) offset;
case opc of
when '00'
size = 4;
when '01'
size = 8;
when '10'
size = 4;
signed = TRUE;
when '11'
memop = MemOp_PREFETCH;
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<label> Is the program label from which the data is to be loaded. Its offset from the address of this instruction,
in the range +/-1MB, is encoded as "imm19" times 4.
Operation
if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);
case memop of
when MemOp_LOAD
data = Mem[address, size, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, 64);
else
X[t] = data;
when MemOp_PREFETCH
Prefetch(address, t<4:0>);
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register (register) calculates an address from a base register value and an offset register value, loads a word
from memory, and writes it to a register. The offset register value can optionally be shifted and extended. For
information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 1 Rm option S 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.
<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
<amount> For the 32-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #2
For the 64-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #3
integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
integer regsize;
Operation
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register, with pointer authentication. This instruction authenticates an address from a base register using a
modifier of zero and the specified key, adds an immediate offset to the authenticated address, and loads a 64-bit
doubleword from memory at this resulting address into a register.
Key A is used for LDRAA, and key B is used for LDRAB.
If the authentication passes, the PE behaves the same as for an LDR instruction. If the authentication fails, a
Translation fault is generated.
The authenticated address is not written back to the base register, unless the pre-indexed variant of the instruction is
used. In this case, the address that is written back to the base register does not include the pointer authentication
code.
For information about memory accesses, see Load/Store addressing modes.
Unscaled offset
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 M S 1 imm9 W 1 Rn Rt
size
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, a multiple of 8 in the range -4096 to 4088, defaulting to 0
and encoded in the "S:imm9" field as <simm>/8.
bits(64) address;
bits(64) data;
boolean wb_unknown = FALSE;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
address = SP[];
else
address = X[n];
if use_key_a then
address = AuthDA(address, X[31], TRUE);
else
address = AuthDB(address, X[31], TRUE);
if n == 31 then
CheckSPAlignment();
if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Byte (immediate) loads a byte from memory, zero-extends it, and writes the result to a register. The
address that is used for the load is calculated from a base register and an immediate offset. For information about
memory accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 0 imm9 0 1 Rn Rt
size opc
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 0 imm9 1 1 Rn Rt
size opc
Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 1 0 1 imm12 Rn Rt
size opc
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDRH (immediate).
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to 0 and encoded in the
"imm12" field.
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Byte (register) calculates an address from a base register value and an offset register value, loads a
byte from memory, zero-extends it, and writes it to a register. For information about memory accesses, see Load/Store
addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 1 Rm option S 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.
option <extend>
010 UXTW
110 SXTW
111 SXTX
<amount> Is the index shift amount, it must be #0, encoded in "S" as 0 if omitted, or as 1 if present.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Halfword (immediate) loads a halfword from memory, zero-extends it, and writes the result to a register.
The address that is used for the load is calculated from a base register and an immediate offset. For information about
memory accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 0 imm9 0 1 Rn Rt
size opc
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 0 imm9 1 1 Rn Rt
size opc
Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 1 0 1 imm12 Rn Rt
size opc
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDRH (immediate).
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, a multiple of 2 in the range 0 to 8190, defaulting to 0 and
encoded in the "imm12" field as <pimm>/2.
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Halfword (register) calculates an address from a base register value and an offset register value, loads a
halfword from memory, zero-extends it, and writes it to a register. For information about memory accesses, see Load/
Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 1 Rm option S 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.
<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
<amount> Is the index shift amount, optional only when <extend> is not LSL. Where it is permitted to be optional,
it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #1
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Signed Byte (immediate) loads a byte from memory, sign-extends it to either 32 bits or 64 bits, and
writes the result to a register. The address that is used for the load is calculated from a base register and an
immediate offset. For information about memory accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 0 imm9 0 1 Rn Rt
size opc
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 0 imm9 1 1 Rn Rt
size opc
Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 1 1 x imm12 Rn Rt
size opc
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDRSB (immediate).
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to 0 and encoded in the
"imm12" field.
integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop;
boolean signed;
integer regsize;
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
case memop of
when MemOp_STORE
if rt_unknown then
data = bits(8) UNKNOWN;
else
data = X[t];
Mem[address, 1, AccType_NORMAL] = data;
when MemOp_LOAD
data = Mem[address, 1, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);
when MemOp_PREFETCH
Prefetch(address, t<4:0>);
if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Signed Byte (register) calculates an address from a base register value and an offset register value,
loads a byte from memory, sign-extends it, and writes it to a register. For information about memory accesses, see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 1 Rm option S 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.
option <extend>
010 UXTW
110 SXTW
111 SXTX
<amount> Is the index shift amount, it must be #0, encoded in "S" as 0 if omitted, or as 1 if present.
integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
MemOp memop;
boolean signed;
integer regsize;
Operation
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
Mem[address, 1, AccType_NORMAL] = data;
when MemOp_LOAD
data = Mem[address, 1, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);
when MemOp_PREFETCH
Prefetch(address, t<4:0>);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Signed Halfword (immediate) loads a halfword from memory, sign-extends it to 32 bits or 64 bits, and
writes the result to a register. The address that is used for the load is calculated from a base register and an
immediate offset. For information about memory accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 0 imm9 0 1 Rn Rt
size opc
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 0 imm9 1 1 Rn Rt
size opc
Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 1 1 x imm12 Rn Rt
size opc
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDRSH (immediate).
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, a multiple of 2 in the range 0 to 8190, defaulting to 0 and
encoded in the "imm12" field as <pimm>/2.
integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop;
boolean signed;
integer regsize;
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
case memop of
when MemOp_STORE
if rt_unknown then
data = bits(16) UNKNOWN;
else
data = X[t];
Mem[address, 2, AccType_NORMAL] = data;
when MemOp_LOAD
data = Mem[address, 2, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);
when MemOp_PREFETCH
Prefetch(address, t<4:0>);
if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Signed Halfword (register) calculates an address from a base register value and an offset register value,
loads a halfword from memory, sign-extends it, and writes it to a register. For information about memory accesses see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 1 Rm option S 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.
<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
<amount> Is the index shift amount, optional only when <extend> is not LSL. Where it is permitted to be optional,
it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #1
integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
MemOp memop;
boolean signed;
integer regsize;
Operation
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
Mem[address, 2, AccType_NORMAL] = data;
when MemOp_LOAD
data = Mem[address, 2, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);
when MemOp_PREFETCH
Prefetch(address, t<4:0>);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Signed Word (immediate) loads a word from memory, sign-extends it to 64 bits, and writes the result to
a register. The address that is used for the load is calculated from a base register and an immediate offset. For
information about memory accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 0 imm9 0 1 Rn Rt
size opc
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 0 imm9 1 1 Rn Rt
size opc
Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 1 1 0 imm12 Rn Rt
size opc
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDRSW (immediate).
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, a multiple of 4 in the range 0 to 16380, defaulting to 0
and encoded in the "imm12" field as <pimm>/4.
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(32) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Signed Word (literal) calculates an address from the PC value and an immediate offset, loads a word
from memory, and writes it to a register. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 0 0 imm19 Rt
opc
integer t = UInt(Rt);
bits(64) offset;
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<label> Is the program label from which the data is to be loaded. Its offset from the address of this instruction,
in the range +/-1MB, is encoded as "imm19" times 4.
Operation
if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Signed Word (register) calculates an address from a base register value and an offset register value,
loads a word from memory, sign-extends it to form a 64-bit value, and writes it to a register. The offset register value
can be shifted left by 0 or 2 bits. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 1 Rm option S 1 0 Rn Rt
size opc
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.
<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
<amount> Is the index shift amount, optional only when <extend> is not LSL. Where it is permitted to be optional,
it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #2
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic bit set on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory,
performs a bitwise OR with the value held in a register on it, and stores the result back to memory. The value initially
loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDSETA and LDSETAL load from memory with acquire
semantics.
• LDSETL and LDSETAL store to memory with release semantics.
• LDSET has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSET, STSETL.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 0 1 1 0 0 Rn Rt
size opc
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
bits(64) address;
bits(datasize) value;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, regsize);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic bit set on byte in memory atomically loads an 8-bit byte from memory, performs a bitwise OR with the value
held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the
destination register.
• If the destination register is not WZR, LDSETAB and LDSETALB load from memory with acquire semantics.
• LDSETLB and LDSETALB store to memory with release semantics.
• LDSETB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSETB, STSETLB.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 0 1 1 0 0 Rn Rt
size opc
LDSETAB (A == 1 && R == 0)
LDSETALB (A == 1 && R == 1)
LDSETB (A == 0 && R == 0)
LDSETLB (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDSETB, LDSETAB,
Page 448
LDSETALB, LDSETLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDSETB, LDSETAB,
Page 449
LDSETALB, LDSETLB
LDSETH, LDSETAH, LDSETALH, LDSETLH
Atomic bit set on halfword in memory atomically loads a 16-bit halfword from memory, performs a bitwise OR with the
value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned
in the destination register.
• If the destination register is not WZR, LDSETAH and LDSETALH load from memory with acquire semantics.
• LDSETLH and LDSETALH store to memory with release semantics.
• LDSETH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSETH, STSETLH.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 0 1 1 0 0 Rn Rt
size opc
LDSETAH (A == 1 && R == 0)
LDSETALH (A == 1 && R == 1)
LDSETH (A == 0 && R == 0)
LDSETLH (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDSETH, LDSETAH,
Page 450
LDSETALH, LDSETLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDSETH, LDSETAH,
Page 451
LDSETALH, LDSETLH
LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL
Atomic signed maximum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from
memory, compares it against the value held in a register, and stores the larger value back to memory, treating the
values as signed numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDSMAXA and LDSMAXAL load from memory with acquire
semantics.
• LDSMAXL and LDSMAXAL store to memory with release semantics.
• LDSMAX has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSMAX, STSMAXL.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 1 0 0 0 0 Rn Rt
size opc
LDSMAX, LDSMAXA,
Page 452
LDSMAXAL, LDSMAXL
32-bit LDSMAX (size == 10 && A == 0 && R == 0)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDSMAX, LDSMAXA,
Page 453
LDSMAXAL, LDSMAXL
Operation
bits(64) address;
bits(datasize) value;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, regsize);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDSMAX, LDSMAXA,
Page 454
LDSMAXAL, LDSMAXL
LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB
Atomic signed maximum on byte in memory atomically loads an 8-bit byte from memory, compares it against the value
held in a register, and stores the larger value back to memory, treating the values as signed numbers. The value
initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDSMAXAB and LDSMAXALB load from memory with acquire semantics.
• LDSMAXLB and LDSMAXALB store to memory with release semantics.
• LDSMAXB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSMAXB, STSMAXLB.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 1 0 0 0 0 Rn Rt
size opc
LDSMAXAB (A == 1 && R == 0)
LDSMAXALB (A == 1 && R == 1)
LDSMAXB (A == 0 && R == 0)
LDSMAXLB (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDSMAXB, LDSMAXAB,
Page 455
LDSMAXALB, LDSMAXLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDSMAXB, LDSMAXAB,
Page 456
LDSMAXALB, LDSMAXLB
LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH
Atomic signed maximum on halfword in memory atomically loads a 16-bit halfword from memory, compares it against
the value held in a register, and stores the larger value back to memory, treating the values as signed numbers. The
value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDSMAXAH and LDSMAXALH load from memory with acquire semantics.
• LDSMAXLH and LDSMAXALH store to memory with release semantics.
• LDSMAXH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSMAXH, STSMAXLH.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 1 0 0 0 0 Rn Rt
size opc
LDSMAXAH (A == 1 && R == 0)
LDSMAXALH (A == 1 && R == 1)
LDSMAXH (A == 0 && R == 0)
LDSMAXLH (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDSMAXH, LDSMAXAH,
Page 457
LDSMAXALH, LDSMAXLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDSMAXH, LDSMAXAH,
Page 458
LDSMAXALH, LDSMAXLH
LDSMIN, LDSMINA, LDSMINAL, LDSMINL
Atomic signed minimum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from
memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the
values as signed numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDSMINA and LDSMINAL load from memory with acquire
semantics.
• LDSMINL and LDSMINAL store to memory with release semantics.
• LDSMIN has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSMIN, STSMINL.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 1 0 1 0 0 Rn Rt
size opc
LDSMIN, LDSMINA,
Page 459
LDSMINAL, LDSMINL
32-bit LDSMIN (size == 10 && A == 0 && R == 0)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDSMIN, LDSMINA,
Page 460
LDSMINAL, LDSMINL
Operation
bits(64) address;
bits(datasize) value;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, regsize);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDSMIN, LDSMINA,
Page 461
LDSMINAL, LDSMINL
LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB
Atomic signed minimum on byte in memory atomically loads an 8-bit byte from memory, compares it against the value
held in a register, and stores the smaller value back to memory, treating the values as signed numbers. The value
initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDSMINAB and LDSMINALB load from memory with acquire semantics.
• LDSMINLB and LDSMINALB store to memory with release semantics.
• LDSMINB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSMINB, STSMINLB.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 1 0 1 0 0 Rn Rt
size opc
LDSMINAB (A == 1 && R == 0)
LDSMINALB (A == 1 && R == 1)
LDSMINB (A == 0 && R == 0)
LDSMINLB (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDSMINB, LDSMINAB,
Page 462
LDSMINALB, LDSMINLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDSMINB, LDSMINAB,
Page 463
LDSMINALB, LDSMINLB
LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH
Atomic signed minimum on halfword in memory atomically loads a 16-bit halfword from memory, compares it against
the value held in a register, and stores the smaller value back to memory, treating the values as signed numbers. The
value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDSMINAH and LDSMINALH load from memory with acquire semantics.
• LDSMINLH and LDSMINALH store to memory with release semantics.
• LDSMINH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSMINH, STSMINLH.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 1 0 1 0 0 Rn Rt
size opc
LDSMINAH (A == 1 && R == 0)
LDSMINALH (A == 1 && R == 1)
LDSMINH (A == 0 && R == 0)
LDSMINLH (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDSMINH, LDSMINAH,
Page 464
LDSMINALH, LDSMINLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDSMINH, LDSMINAH,
Page 465
LDSMINALH, LDSMINLH
LDTR
Load Register (unprivileged) loads a word or doubleword from memory, and writes it to a register. The address that is
used for the load is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 0 imm9 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
AccType acctype;
unpriv_at_el1 = PSTATE.EL == EL1 && !(EL2Enabled() && HaveNVExt() && HCR_EL2.<NV,NV1> == '11');
unpriv_at_el2 = PSTATE.EL == EL2 && HaveVirtHostExt() && HCR_EL2.<E2H,TGE> == '11';
integer regsize;
bits(64) address;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Byte (unprivileged) loads a byte from memory, zero-extends it, and writes the result to a register. The
address that is used for the load is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 0 imm9 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
AccType acctype;
unpriv_at_el1 = PSTATE.EL == EL1 && !(EL2Enabled() && HaveNVExt() && HCR_EL2.<NV,NV1> == '11');
unpriv_at_el2 = PSTATE.EL == EL2 && HaveVirtHostExt() && HCR_EL2.<E2H,TGE> == '11';
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Halfword (unprivileged) loads a halfword from memory, zero-extends it, and writes the result to a
register. The address that is used for the load is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 0 imm9 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
AccType acctype;
unpriv_at_el1 = PSTATE.EL == EL1 && !(EL2Enabled() && HaveNVExt() && HCR_EL2.<NV,NV1> == '11');
unpriv_at_el2 = PSTATE.EL == EL2 && HaveVirtHostExt() && HCR_EL2.<E2H,TGE> == '11';
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Signed Byte (unprivileged) loads a byte from memory, sign-extends it to 32 bits or 64 bits, and writes
the result to a register. The address that is used for the load is calculated from a base register and an immediate
offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 0 imm9 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
integer n = UInt(Rn);
integer t = UInt(Rt);
AccType acctype;
unpriv_at_el1 = PSTATE.EL == EL1 && !(EL2Enabled() && HaveNVExt() && HCR_EL2.<NV,NV1> == '11');
unpriv_at_el2 = PSTATE.EL == EL2 && HaveVirtHostExt() && HCR_EL2.<E2H,TGE> == '11';
MemOp memop;
boolean signed;
integer regsize;
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
Mem[address, 1, acctype] = data;
when MemOp_LOAD
data = Mem[address, 1, acctype];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);
when MemOp_PREFETCH
Prefetch(address, t<4:0>);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Signed Halfword (unprivileged) loads a halfword from memory, sign-extends it to 32 bits or 64 bits, and
writes the result to a register. The address that is used for the load is calculated from a base register and an
immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 0 imm9 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
integer n = UInt(Rn);
integer t = UInt(Rt);
AccType acctype;
unpriv_at_el1 = PSTATE.EL == EL1 && !(EL2Enabled() && HaveNVExt() && HCR_EL2.<NV,NV1> == '11');
unpriv_at_el2 = PSTATE.EL == EL2 && HaveVirtHostExt() && HCR_EL2.<E2H,TGE> == '11';
MemOp memop;
boolean signed;
integer regsize;
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
Mem[address, 2, acctype] = data;
when MemOp_LOAD
data = Mem[address, 2, acctype];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);
when MemOp_PREFETCH
Prefetch(address, t<4:0>);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Signed Word (unprivileged) loads a word from memory, sign-extends it to 64 bits, and writes the result
to a register. The address that is used for the load is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 0 imm9 1 0 Rn Rt
size opc
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
AccType acctype;
unpriv_at_el1 = PSTATE.EL == EL1 && !(EL2Enabled() && HaveNVExt() && HCR_EL2.<NV,NV1> == '11');
unpriv_at_el2 = PSTATE.EL == EL2 && HaveVirtHostExt() && HCR_EL2.<E2H,TGE> == '11';
Operation
bits(64) address;
bits(32) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic unsigned maximum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword
from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the
values as unsigned numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDUMAXA and LDUMAXAL load from memory with acquire
semantics.
• LDUMAXL and LDUMAXAL store to memory with release semantics.
• LDUMAX has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STUMAX, STUMAXL.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 1 1 0 0 0 Rn Rt
size opc
LDUMAX, LDUMAXA,
Page 478
LDUMAXAL, LDUMAXL
32-bit LDUMAX (size == 10 && A == 0 && R == 0)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDUMAX, LDUMAXA,
Page 479
LDUMAXAL, LDUMAXL
Operation
bits(64) address;
bits(datasize) value;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, regsize);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDUMAX, LDUMAXA,
Page 480
LDUMAXAL, LDUMAXL
LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB
Atomic unsigned maximum on byte in memory atomically loads an 8-bit byte from memory, compares it against the
value held in a register, and stores the larger value back to memory, treating the values as unsigned numbers. The
value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDUMAXAB and LDUMAXALB load from memory with acquire semantics.
• LDUMAXLB and LDUMAXALB store to memory with release semantics.
• LDUMAXB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STUMAXB, STUMAXLB.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 1 1 0 0 0 Rn Rt
size opc
LDUMAXAB (A == 1 && R == 0)
LDUMAXALB (A == 1 && R == 1)
LDUMAXB (A == 0 && R == 0)
LDUMAXLB (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDUMAXB, LDUMAXAB,
Page 481
LDUMAXALB, LDUMAXLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDUMAXB, LDUMAXAB,
Page 482
LDUMAXALB, LDUMAXLB
LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH
Atomic unsigned maximum on halfword in memory atomically loads a 16-bit halfword from memory, compares it
against the value held in a register, and stores the larger value back to memory, treating the values as unsigned
numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDUMAXAH and LDUMAXALH load from memory with acquire semantics.
• LDUMAXLH and LDUMAXALH store to memory with release semantics.
• LDUMAXH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STUMAXH, STUMAXLH.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 1 1 0 0 0 Rn Rt
size opc
LDUMAXAH (A == 1 && R == 0)
LDUMAXALH (A == 1 && R == 1)
LDUMAXH (A == 0 && R == 0)
LDUMAXLH (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDUMAXH, LDUMAXAH,
Page 483
LDUMAXALH, LDUMAXLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDUMAXH, LDUMAXAH,
Page 484
LDUMAXALH, LDUMAXLH
LDUMIN, LDUMINA, LDUMINAL, LDUMINL
Atomic unsigned minimum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword
from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating
the values as unsigned numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDUMINA and LDUMINAL load from memory with acquire
semantics.
• LDUMINL and LDUMINAL store to memory with release semantics.
• LDUMIN has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STUMIN, STUMINL.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 1 1 1 0 0 Rn Rt
size opc
LDUMIN, LDUMINA,
Page 485
LDUMINAL, LDUMINL
32-bit LDUMIN (size == 10 && A == 0 && R == 0)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDUMIN, LDUMINA,
Page 486
LDUMINAL, LDUMINL
Operation
bits(64) address;
bits(datasize) value;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, regsize);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDUMIN, LDUMINA,
Page 487
LDUMINAL, LDUMINL
LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB
Atomic unsigned minimum on byte in memory atomically loads an 8-bit byte from memory, compares it against the
value held in a register, and stores the smaller value back to memory, treating the values as unsigned numbers. The
value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDUMINAB and LDUMINALB load from memory with acquire semantics.
• LDUMINLB and LDUMINALB store to memory with release semantics.
• LDUMINB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STUMINB, STUMINLB.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 1 1 1 0 0 Rn Rt
size opc
LDUMINAB (A == 1 && R == 0)
LDUMINALB (A == 1 && R == 1)
LDUMINB (A == 0 && R == 0)
LDUMINLB (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDUMINB, LDUMINAB,
Page 488
LDUMINALB, LDUMINLB
Operation
bits(64) address;
bits(8) value;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDUMINB, LDUMINAB,
Page 489
LDUMINALB, LDUMINLB
LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH
Atomic unsigned minimum on halfword in memory atomically loads a 16-bit halfword from memory, compares it
against the value held in a register, and stores the smaller value back to memory, treating the values as unsigned
numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDUMINAH and LDUMINALH load from memory with acquire semantics.
• LDUMINLH and LDUMINALH store to memory with release semantics.
• LDUMINH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STUMINH, STUMINLH.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 1 1 1 0 0 Rn Rt
size opc
LDUMINAH (A == 1 && R == 0)
LDUMINALH (A == 1 && R == 1)
LDUMINH (A == 0 && R == 0)
LDUMINLH (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Alias Conditions
LDUMINH, LDUMINAH,
Page 490
LDUMINALH, LDUMINLH
Operation
bits(64) address;
bits(16) value;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if t != 31 then
X[t] = ZeroExtend(data, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LDUMINH, LDUMINAH,
Page 491
LDUMINALH, LDUMINLH
LDUR
Load Register (unscaled) calculates an address from a base register and an immediate offset, loads a 32-bit word or
64-bit doubleword from memory, zero-extends it, and writes it to a register. For information about memory accesses,
see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer regsize;
Operation
bits(64) address;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Byte (unscaled) calculates an address from a base register and an immediate offset, loads a byte from
memory, zero-extends it, and writes it to a register. For information about memory accesses, see Load/Store
addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Halfword (unscaled) calculates an address from a base register and an immediate offset, loads a
halfword from memory, zero-extends it, and writes it to a register. For information about memory accesses, see Load/
Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Signed Byte (unscaled) calculates an address from a base register and an immediate offset, loads a
signed byte from memory, sign-extends it, and writes it to a register. For information about memory accesses, see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop;
boolean signed;
integer regsize;
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
Mem[address, 1, AccType_NORMAL] = data;
when MemOp_LOAD
data = Mem[address, 1, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);
when MemOp_PREFETCH
Prefetch(address, t<4:0>);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Signed Halfword (unscaled) calculates an address from a base register and an immediate offset, loads a
signed halfword from memory, sign-extends it, and writes it to a register. For information about memory accesses, see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop;
boolean signed;
integer regsize;
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = X[t];
Mem[address, 2, AccType_NORMAL] = data;
when MemOp_LOAD
data = Mem[address, 2, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);
when MemOp_PREFETCH
Prefetch(address, t<4:0>);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Register Signed Word (unscaled) calculates an address from a base register and an immediate offset, loads a
signed word from memory, sign-extends it, and writes it to a register. For information about memory accesses, see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(32) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Exclusive Pair of Registers derives an address from a base register value, loads two 32-bit words or two 64-bit
doublewords from memory, and writes them to two registers. For information on single-copy atomicity and alignment
requirements, see Requirements for single-copy atomicity and Alignment of data accesses. The PE marks the physical
address being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions.
See Synchronization and semaphores. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 sz 0 0 1 0 0 0 0 1 1 (1) (1) (1) (1) (1) 0 Rt2 Rn Rt
L Rs o0
32-bit (sz == 0)
64-bit (sz == 1)
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDXP.
Assembler Symbols
<Wt1> Is the 32-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Wt2> Is the 32-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
bits(64) address;
bits(datasize) data;
constant integer dbytes = datasize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if rt_unknown then
// ConstrainedUNPREDICTABLE case
X[t] = bits(datasize) UNKNOWN; // In this case t = t2
elsif elsize == 32 then
// 32-bit load exclusive pair (atomic)
data = Mem[address, dbytes, AccType_ATOMIC];
if BigEndian(AccType_ATOMIC) then
X[t] = data<datasize-1:elsize>;
X[t2] = data<elsize-1:0>;
else
X[t] = data<elsize-1:0>;
X[t2] = data<datasize-1:elsize>;
else // elsize == 64
// 64-bit load exclusive pair (not atomic),
// but must be 128-bit aligned
if address != Align(address, dbytes) then
AArch64.Abort(address, AlignmentFault(AccType_ATOMIC, FALSE, FALSE));
X[t] = Mem[address, 8, AccType_ATOMIC];
X[t2] = Mem[address+8, 8, AccType_ATOMIC];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Exclusive Register derives an address from a base register value, loads a 32-bit word or a 64-bit doubleword
from memory, and writes it to a register. The memory access is atomic. The PE marks the physical address being
accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See
Synchronization and semaphores. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Load Exclusive Register Byte derives an address from a base register value, loads a byte from memory, zero-extends it
and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed as an
exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See Synchronization and
semaphores. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Exclusive Register Halfword derives an address from a base register value, loads a halfword from memory, zero-
extends it and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed
as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See Synchronization and
semaphores. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Logical Shift Left (immediate) shifts a register value left by an immediate number of bits, shifting in zeros, and writes
the result to the destination register.
• The encodings in this description are named to match the encodings of UBFM.
• The description of UBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 1 0 N immr != x11111 Rn Rd
opc imms
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<shift> For the 32-bit variant: is the shift amount, in the range 0 to 31.
For the 64-bit variant: is the shift amount, in the range 0 to 63.
Operation
The description of UBFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Logical Shift Left (register) shifts a register value left by a variable number of bits, shifting in zeros, and writes the
result to the destination register. The remainder obtained by dividing the second source register by the data size
defines the number of bits by which the first source register is left-shifted.
• The encodings in this description are named to match the encodings of LSLV.
• The description of LSLV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 0 0 Rn Rd
op2
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in
its bottom 5 bits, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in
its bottom 6 bits, encoded in the "Rm" field.
Operation
The description of LSLV gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Logical Shift Left Variable shifts a register value left by a variable number of bits, shifting in zeros, and writes the
result to the destination register. The remainder obtained by dividing the second source register by the data size
defines the number of bits by which the first source register is left-shifted.
This instruction is used by the alias LSL (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 0 0 Rn Rd
op2
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
ShiftType shift_type = DecodeShift(op2);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in
its bottom 5 bits, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in
its bottom 6 bits, encoded in the "Rm" field.
Operation
bits(datasize) result;
bits(datasize) operand2 = X[m];
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Logical Shift Right (immediate) shifts a register value right by an immediate number of bits, shifting in zeros, and
writes the result to the destination register.
• The encodings in this description are named to match the encodings of UBFM.
• The description of UBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 1 0 N immr x 1 1 1 1 1 Rn Rd
opc imms
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<shift> For the 32-bit variant: is the shift amount, in the range 0 to 31, encoded in the "immr" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, encoded in the "immr" field.
Operation
The description of UBFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Logical Shift Right (register) shifts a register value right by a variable number of bits, shifting in zeros, and writes the
result to the destination register. The remainder obtained by dividing the second source register by the data size
defines the number of bits by which the first source register is right-shifted.
• The encodings in this description are named to match the encodings of LSRV.
• The description of LSRV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 0 1 Rn Rd
op2
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in
its bottom 5 bits, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in
its bottom 6 bits, encoded in the "Rm" field.
Operation
The description of LSRV gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Logical Shift Right Variable shifts a register value right by a variable number of bits, shifting in zeros, and writes the
result to the destination register. The remainder obtained by dividing the second source register by the data size
defines the number of bits by which the first source register is right-shifted.
This instruction is used by the alias LSR (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 0 1 Rn Rd
op2
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
ShiftType shift_type = DecodeShift(op2);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in
its bottom 5 bits, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in
its bottom 6 bits, encoded in the "Rm" field.
Operation
bits(datasize) result;
bits(datasize) operand2 = X[m];
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply-Add multiplies two register values, adds a third register value, and writes the result to the destination
register.
This instruction is used by the alias MUL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 1 0 0 0 Rm 0 Ra Rn Rd
o0
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);
integer destsize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Wa> Is the 32-bit name of the third general-purpose source register holding the addend, encoded in the "Ra"
field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xa> Is the 64-bit name of the third general-purpose source register holding the addend, encoded in the "Ra"
field.
Alias Conditions
Operation
integer result;
X[d] = result<destsize-1:0>;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply-Negate multiplies two register values, negates the product, and writes the result to the destination register.
• The encodings in this description are named to match the encodings of MSUB.
• The description of MSUB gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 1 0 0 0 Rm 1 1 1 1 1 1 Rn Rd
o0 Ra
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
Operation
The description of MSUB gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
• The encodings in this description are named to match the encodings of ORR (immediate).
• The description of ORR (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 0 0 N immr imms 1 1 1 1 1 Rd
opc Rn
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<imm> For the 32-bit variant: is the bitmask immediate, encoded in "imms:immr", but excluding values which
could be encoded by MOVZ or MOVN.
For the 64-bit variant: is the bitmask immediate, encoded in "N:imms:immr", but excluding values which
could be encoded by MOVZ or MOVN.
Operation
The description of ORR (immediate) gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move (inverted wide immediate) moves an inverted 16-bit immediate value to a register.
• The encodings in this description are named to match the encodings of MOVN.
• The description of MOVN gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 0 1 hw imm16 Rd
opc
is equivalent to
and is the preferred disassembly when ! (IsZero(imm16) && hw != '00') && ! IsOnes(imm16).
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<imm> For the 32-bit variant: is a 32-bit immediate, the bitwise inverse of which can be encoded in
"imm16:hw", but excluding 0xffff0000 and 0x0000ffff
For the 64-bit variant: is a 64-bit immediate, the bitwise inverse of which can be encoded in
"imm16:hw".
<shift> For the 32-bit variant: is the amount by which to shift the immediate left, either 0 (the default) or 16,
encoded in the "hw" field as <shift>/16.
For the 64-bit variant: is the amount by which to shift the immediate left, either 0 (the default), 16, 32
or 48, encoded in the "hw" field as <shift>/16.
Operation
The description of MOVN gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Move (register) copies the value in a source register to the destination register.
• The encodings in this description are named to match the encodings of ORR (shifted register).
• The description of ORR (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 1 1 1 1 1 Rd
opc shift N imm6 Rn
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wm> Is the 32-bit name of the general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xm> Is the 64-bit name of the general-purpose source register, encoded in the "Rm" field.
Operation
The description of ORR (shifted register) gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
: Rd = Rn.
• The encodings in this description are named to match the encodings of ADD (immediate).
• The description of ADD (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Rn Rd
op S sh imm12
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Wn|WSP> Is the 32-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
Operation
The description of ADD (immediate) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
• The encodings in this description are named to match the encodings of MOVZ.
• The description of MOVZ gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 0 1 hw imm16 Rd
opc
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<imm> For the 32-bit variant: is a 32-bit immediate which can be encoded in "imm16:hw".
For the 64-bit variant: is a 64-bit immediate which can be encoded in "imm16:hw".
<shift> For the 32-bit variant: is the amount by which to shift the immediate left, either 0 (the default) or 16,
encoded in the "hw" field as <shift>/16.
For the 64-bit variant: is the amount by which to shift the immediate left, either 0 (the default), 16, 32
or 48, encoded in the "hw" field as <shift>/16.
Operation
The description of MOVZ gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move wide with keep moves an optionally-shifted 16-bit immediate value into a register, keeping other bits unchanged.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 1 0 1 hw imm16 Rd
opc
64-bit (sf == 1)
integer d = UInt(Rd);
integer datasize = if sf == '1' then 64 else 32;
integer pos;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<imm> Is the 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.
<shift> For the 32-bit variant: is the amount by which to shift the immediate left, either 0 (the default) or 16,
encoded in the "hw" field as <shift>/16.
For the 64-bit variant: is the amount by which to shift the immediate left, either 0 (the default), 16, 32
or 48, encoded in the "hw" field as <shift>/16.
Operation
bits(datasize) result;
result = X[d];
result<pos+15:pos> = imm16;
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move wide with NOT moves the inverse of an optionally-shifted 16-bit immediate value to a register.
This instruction is used by the alias MOV (inverted wide immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 0 1 hw imm16 Rd
opc
64-bit (sf == 1)
integer d = UInt(Rd);
integer datasize = if sf == '1' then 64 else 32;
integer pos;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<imm> Is the 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.
<shift> For the 32-bit variant: is the amount by which to shift the immediate left, either 0 (the default) or 16,
encoded in the "hw" field as <shift>/16.
For the 64-bit variant: is the amount by which to shift the immediate left, either 0 (the default), 16, 32
or 48, encoded in the "hw" field as <shift>/16.
Alias Conditions
Of
Alias Is preferred when
variant
MOV (inverted wide 64-bit ! (IsZero(imm16) && hw != '00')
immediate)
MOV (inverted wide 32-bit ! (IsZero(imm16) && hw != '00') && ! IsOnes(imm16)
immediate)
Operation
bits(datasize) result;
result = Zeros();
result<pos+15:pos> = imm16;
result = NOT(result);
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move wide with zero moves an optionally-shifted 16-bit immediate value to a register.
This instruction is used by the alias MOV (wide immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 0 1 hw imm16 Rd
opc
64-bit (sf == 1)
integer d = UInt(Rd);
integer datasize = if sf == '1' then 64 else 32;
integer pos;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<imm> Is the 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.
<shift> For the 32-bit variant: is the amount by which to shift the immediate left, either 0 (the default) or 16,
encoded in the "hw" field as <shift>/16.
For the 64-bit variant: is the amount by which to shift the immediate left, either 0 (the default), 16, 32
or 48, encoded in the "hw" field as <shift>/16.
Alias Conditions
Operation
bits(datasize) result;
result = Zeros();
result<pos+15:pos> = imm16;
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Move System Register allows the PE to read an AArch64 System register into a general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 1 1 o0 op1 CRn CRm op2 Rt
L
integer t = UInt(Rt);
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose destination register, encoded in the "Rt" field.
<systemreg> Is a System register name, encoded in the "o0:op1:CRn:CRm:op2".
The System register names are defined in 'AArch64 System Registers' in the System Register XML.
o0 <op0>
0 2
1 3
<op1> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op1" field.
<Cn> Is a name 'Cn', with 'n' in the range 0 to 15, encoded in the "CRn" field.
<Cm> Is a name 'Cm', with 'm' in the range 0 to 15, encoded in the "CRm" field.
<op2> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op2" field.
Operation
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move immediate value to Special Register moves an immediate value to selected bits of the PSTATE. For more
information, see Process state, PSTATE.
The bits that can be written by this instruction are:
• PSTATE.D, PSTATE.A, PSTATE.I, PSTATE.F, and PSTATE.SP.
• If FEAT_SSBS is implemented, PSTATE.SSBS.
• If FEAT_PAN is implemented, PSTATE.PAN.
• If FEAT_UAO is implemented, PSTATE.UAO.
• If FEAT_DIT is implemented, PSTATE.DIT.
• If FEAT_MTE is implemented, PSTATE.TCO.
• If FEAT_NMI is implemented, PSTATE.ALLINT.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 op1 0 1 0 0 CRm op2 1 1 1 1 1
bits(2) min_EL;
boolean need_secure = FALSE;
case op1 of
when '00x'
min_EL = EL1;
when '010'
min_EL = EL1;
when '011'
min_EL = EL0;
when '100'
min_EL = EL2;
when '101'
if !HaveVirtHostExt() then
UNDEFINED;
min_EL = EL2;
when '110'
min_EL = EL3;
when '111'
min_EL = EL1;
need_secure = TRUE;
PSTATEField field;
case op1:op2 of
when '000 011'
if !HaveUAOExt() then UNDEFINED;
field = PSTATEField_UAO;
when '000 100'
if !HavePANExt() then UNDEFINED;
field = PSTATEField_PAN;
when '000 101' field = PSTATEField_SP;
when '001 000'
if !HaveFeatNMI() then UNDEFINED;
if CRm<3:1> != '000' then UNDEFINED;
field = PSTATEField_ALLINT;
when '011 010'
if !HaveDITExt() then UNDEFINED;
field = PSTATEField_DIT;
when '011 100'
if !HaveMTEExt() then UNDEFINED;
field = PSTATEField_TCO;
when '011 110' field = PSTATEField_DAIFSet;
when '011 111' field = PSTATEField_DAIFClr;
when '011 001'
if !HaveSSBSExt() then UNDEFINED;
field = PSTATEField_SSBS;
otherwise UNDEFINED;
<imm> Is a 4-bit unsigned immediate, in the range 0 to 15, encoded in the "CRm" field. Restricted to the range
0 to 1, encoded in "CRm<0>", when <pstatefield> is ALLINT.
Operation
case field of
when PSTATEField_SSBS
PSTATE.SSBS = CRm<0>;
when PSTATEField_SP
PSTATE.SP = CRm<0>;
when PSTATEField_DAIFSet
PSTATE.D = PSTATE.D OR CRm<3>;
PSTATE.A = PSTATE.A OR CRm<2>;
PSTATE.I = PSTATE.I OR CRm<1>;
PSTATE.F = PSTATE.F OR CRm<0>;
when PSTATEField_DAIFClr
PSTATE.D = PSTATE.D AND NOT(CRm<3>);
PSTATE.A = PSTATE.A AND NOT(CRm<2>);
PSTATE.I = PSTATE.I AND NOT(CRm<1>);
PSTATE.F = PSTATE.F AND NOT(CRm<0>);
when PSTATEField_PAN
PSTATE.PAN = CRm<0>;
when PSTATEField_UAO
PSTATE.UAO = CRm<0>;
when PSTATEField_DIT
PSTATE.DIT = CRm<0>;
when PSTATEField_TCO
PSTATE.TCO = CRm<0>;
when PSTATEField_ALLINT
if (PSTATE.EL == EL1 && IsHCRXEL2Enabled() && HCRX_EL2.TALLINT == '1' && CRm<0> == '1') then
AArch64.SystemAccessTrap(EL2, 0x18);
PSTATE.ALLINT = CRm<0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move general-purpose register to System Register allows the PE to write an AArch64 System register from a general-
purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 1 o0 op1 CRn CRm op2 Rt
L
integer t = UInt(Rt);
Assembler Symbols
o0 <op0>
0 2
1 3
<op1> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op1" field.
<Cn> Is a name 'Cn', with 'n' in the range 0 to 15, encoded in the "CRn" field.
<Cm> Is a name 'Cm', with 'm' in the range 0 to 15, encoded in the "CRm" field.
<op2> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op2" field.
<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rt" field.
Operation
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply-Subtract multiplies two register values, subtracts the product from a third register value, and writes the
result to the destination register.
This instruction is used by the alias MNEG.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 1 0 0 0 Rm 1 Ra Rn Rd
o0
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);
integer destsize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Wa> Is the 32-bit name of the third general-purpose source register holding the minuend, encoded in the
"Ra" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xa> Is the 64-bit name of the third general-purpose source register holding the minuend, encoded in the
"Ra" field.
Alias Conditions
Operation
integer result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply
: Rd = Rn * Rm.
• The encodings in this description are named to match the encodings of MADD.
• The description of MADD gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 1 0 0 0 Rm 0 1 1 1 1 1 Rn Rd
o0 Ra
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
Operation
The description of MADD gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise NOT writes the bitwise inverse of a register value to the destination register.
• The encodings in this description are named to match the encodings of ORN (shifted register).
• The description of ORN (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 0 shift 1 Rm imm6 1 1 1 1 1 Rd
opc N Rn
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wm> Is the 32-bit name of the general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xm> Is the 64-bit name of the general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field,
Operation
The description of ORN (shifted register) gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Negate (shifted register) negates an optionally-shifted register value, and writes the result to the destination register.
• The encodings in this description are named to match the encodings of SUB (shifted register).
• The description of SUB (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 0 1 0 1 1 shift 0 Rm imm6 1 1 1 1 1 Rd
op S Rn
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wm> Is the 32-bit name of the general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xm> Is the 64-bit name of the general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field.
Operation
The description of SUB (shifted register) gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Negate, setting flags, negates an optionally-shifted register value, and writes the result to the destination register. It
updates the condition flags based on the result.
• The encodings in this description are named to match the encodings of SUBS (shifted register).
• The description of SUBS (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 1 shift 0 Rm imm6 1 1 1 1 1 != 11111
op S Rn Rd
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wm> Is the 32-bit name of the general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xm> Is the 64-bit name of the general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field.
Operation
The description of SUBS (shifted register) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Negate with Carry negates the sum of a register value and the value of NOT (Carry flag), and writes the result to the
destination register.
• The encodings in this description are named to match the encodings of SBC.
• The description of SBC gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 1 1 1 1 1 Rd
op S Rn
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wm> Is the 32-bit name of the general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xm> Is the 64-bit name of the general-purpose source register, encoded in the "Rm" field.
Operation
The description of SBC gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Negate with Carry, setting flags, negates the sum of a register value and the value of NOT (Carry flag), and writes the
result to the destination register. It updates the condition flags based on the result.
• The encodings in this description are named to match the encodings of SBCS.
• The description of SBCS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 1 1 1 1 1 Rd
op S Rn
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wm> Is the 32-bit name of the general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xm> Is the 64-bit name of the general-purpose source register, encoded in the "Rm" field.
Operation
The description of SBCS gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
No Operation does nothing, other than advance the value of the program counter by 4. This instruction can be used for
instruction alignment purposes.
Note
The timing effects of including a NOP instruction in a program are not guaranteed. It can increase execution time,
leave it unchanged, or even reduce it. Therefore, NOP instructions are not suitable for timing loops.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1
CRm op2
NOP
// Empty.
Operation
// do nothing
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise OR NOT (shifted register) performs a bitwise (inclusive) OR of a register value and the complement of an
optionally-shifted register value, and writes the result to the destination register.
This instruction is used by the alias MVN.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 0 shift 1 Rm imm6 Rn Rd
opc N
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
if sf == '0' && imm6<5> == '1' then UNDEFINED;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field,
Alias Conditions
operand2 = NOT(operand2);
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise OR (immediate) performs a bitwise (inclusive) OR of a register value and an immediate register value, and
writes the result to the destination register.
This instruction is used by the alias MOV (bitmask immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 0 0 N immr imms Rn Rd
opc
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(datasize) imm;
if sf == '0' && N != '0' then UNDEFINED;
(imm, -) = DecodeBitMasks(N, imms, immr, TRUE);
Assembler Symbols
<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<imm> For the 32-bit variant: is the bitmask immediate, encoded in "imms:immr".
For the 64-bit variant: is the bitmask immediate, encoded in "N:imms:immr".
Alias Conditions
Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise OR (shifted register) performs a bitwise (inclusive) OR of a register value and an optionally-shifted register
value, and writes the result to the destination register.
This instruction is used by the alias MOV (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 0 shift 0 Rm imm6 Rn Rd
opc N
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
if sf == '0' && imm6<5> == '1' then UNDEFINED;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field,
Alias Conditions
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Pointer Authentication Code for Data address, using key A. This instruction computes and inserts a pointer
authentication code for a data address, using a modifier and key A.
The address is in the general-purpose register that is specified by <Xd>.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for PACDA.
• The value zero, for PACDZA.
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 0 1 0 Rn Rd
PACDA (Z == 0)
PACDZA <Xd>
if !HavePACExt() then
UNDEFINED;
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the general-purpose source register or stack pointer, encoded in the "Rn" field.
Operation
if source_is_sp then
X[d] = AddPACDA(X[d], SP[]);
else
X[d] = AddPACDA(X[d], X[n]);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Pointer Authentication Code for Data address, using key B. This instruction computes and inserts a pointer
authentication code for a data address, using a modifier and key B.
The address is in the general-purpose register that is specified by <Xd>.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for PACDB.
• The value zero, for PACDZB.
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 0 1 1 Rn Rd
PACDB (Z == 0)
PACDZB <Xd>
if !HavePACExt() then
UNDEFINED;
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the general-purpose source register or stack pointer, encoded in the "Rn" field.
Operation
if source_is_sp then
X[d] = AddPACDB(X[d], SP[]);
else
X[d] = AddPACDB(X[d], X[n]);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Pointer Authentication Code, using Generic key. This instruction computes the pointer authentication code for an
address in the first source register, using a modifier in the second source register, and the Generic key. The computed
pointer authentication code is returned in the upper 32 bits of the destination register.
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 1 0 0 Rn Rd
if !HavePACExt() then
UNDEFINED;
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm|SP> Is the 64-bit name of the second general-purpose source register or stack pointer, encoded in the "Rm"
field.
Operation
if source_is_sp then
X[d] = AddPACGA(X[n], SP[]);
else
X[d] = AddPACGA(X[n], X[m]);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Pointer Authentication Code for Instruction address, using key A. This instruction computes and inserts a pointer
authentication code for an instruction address, using a modifier and key A.
The address is:
• In the general-purpose register that is specified by <Xd> for PACIA and PACIZA.
• In X17, for PACIA1716.
• In X30, for PACIASP and PACIAZ.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for PACIA.
• The value zero, for PACIZA and PACIAZ.
• In X16, for PACIA1716.
• In SP, for PACIASP.
It has encodings from 2 classes: Integer and System
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 0 0 0 Rn Rd
PACIA (Z == 0)
PACIZA <Xd>
if !HavePACExt() then
UNDEFINED;
System
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 x 1 0 0 x 1 1 1 1 1
CRm op2
PACIA1716
PACIASP
PACIAZ
integer d;
integer n;
boolean source_is_sp = FALSE;
case CRm:op2 of
when '0011 000' // PACIAZ
d = 30;
n = 31;
when '0011 001' // PACIASP
d = 30;
source_is_sp = TRUE;
if HaveBTIExt() then
// Check for branch target compatibility between PSTATE.BTYPE
// and implicit branch target of PACIASP instruction.
SetBTypeCompatible(BTypeCompatible_PACIXSP());
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the general-purpose source register or stack pointer, encoded in the "Rn" field.
Operation
if HavePACExt() then
if source_is_sp then
X[d] = AddPACIA(X[d], SP[]);
else
X[d] = AddPACIA(X[d], X[n]);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Pointer Authentication Code for Instruction address, using key B. This instruction computes and inserts a pointer
authentication code for an instruction address, using a modifier and key B.
The address is:
• In the general-purpose register that is specified by <Xd> for PACIB and PACIZB.
• In X17, for PACIB1716.
• In X30, for PACIBSP and PACIBZ.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for PACIB.
• The value zero, for PACIZB and PACIBZ.
• In X16, for PACIB1716.
• In SP, for PACIBSP.
It has encodings from 2 classes: Integer and System
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 0 0 1 Rn Rd
PACIB (Z == 0)
PACIZB <Xd>
if !HavePACExt() then
UNDEFINED;
System
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 x 1 0 1 x 1 1 1 1 1
CRm op2
PACIB1716
PACIBSP
PACIBZ
integer d;
integer n;
boolean source_is_sp = FALSE;
case CRm:op2 of
when '0011 010' // PACIBZ
d = 30;
n = 31;
when '0011 011' // PACIBSP
d = 30;
source_is_sp = TRUE;
if HaveBTIExt() then
// Check for branch target compatibility between PSTATE.BTYPE
// and implicit branch target of PACIBSP instruction.
SetBTypeCompatible(BTypeCompatible_PACIXSP());
when '0001 010' // PACIB1716
d = 17;
n = 16;
when '0001 000' SEE "PACIA";
when '0001 100' SEE "AUTIA";
when '0001 110' SEE "AUTIB";
when '0011 00x' SEE "PACIA";
when '0011 10x' SEE "AUTIA";
when '0011 11x' SEE "AUTIB";
when '0000 111' SEE "XPACLRI";
otherwise SEE "HINT";
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the general-purpose source register or stack pointer, encoded in the "Rn" field.
Operation
if HavePACExt() then
if source_is_sp then
X[d] = AddPACIB(X[d], SP[]);
else
X[d] = AddPACIB(X[d], X[n]);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Prefetch Memory (immediate) signals the memory system that data memory accesses from a specified address are
likely to occur in the near future. The memory system can respond by taking actions that are expected to speed up the
memory accesses when they do occur, such as preloading the cache line containing the specified address into one or
more caches.
The effect of an PRFM instruction is IMPLEMENTATION DEFINED. For more information, see Prefetch memory.
For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 1 1 0 imm12 Rn Rt
size opc
Assembler Symbols
PLI
Preload instructions, encoded in the "Rt<4:3>" field as 0b01.
PST
Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.
L2
Level 2 cache, encoded in the "Rt<2:1>" field as 0b01.
L3
Level 3 cache, encoded in the "Rt<2:1>" field as 0b10.
STRM
Streaming or non-temporal prefetch, for data that is used only once. Encoded in the "Rt<0>" field
as 1.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
bits(64) address;
if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);
if n == 31 then
address = SP[];
else
address = X[n];
Prefetch(address, t<4:0>);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Prefetch Memory (literal) signals the memory system that data memory accesses from a specified address are likely to
occur in the near future. The memory system can respond by taking actions that are expected to speed up the memory
accesses when they do occur, such as preloading the cache line containing the specified address into one or more
caches.
The effect of an PRFM instruction is IMPLEMENTATION DEFINED. For more information, see Prefetch memory.
For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 0 imm19 Rt
opc
integer t = UInt(Rt);
bits(64) offset;
Assembler Symbols
PLI
Preload instructions, encoded in the "Rt<4:3>" field as 0b01.
PST
Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.
L2
Level 2 cache, encoded in the "Rt<2:1>" field as 0b01.
L3
Level 3 cache, encoded in the "Rt<2:1>" field as 0b10.
STRM
Streaming or non-temporal prefetch, for data that is used only once. Encoded in the "Rt<0>" field
as 1.
if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);
Prefetch(address, t<4:0>);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Prefetch Memory (register) signals the memory system that data memory accesses from a specified address are likely
to occur in the near future. The memory system can respond by taking actions that are expected to speed up the
memory accesses when they do occur, such as preloading the cache line containing the specified address into one or
more caches.
The effect of an PRFM instruction is IMPLEMENTATION DEFINED. For more information, see Prefetch memory.
For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 1 0 1 Rm option S 1 0 Rn Rt
size opc
Assembler Symbols
PLI
Preload instructions, encoded in the "Rt<4:3>" field as 0b01.
PST
Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.
L2
Level 2 cache, encoded in the "Rt<2:1>" field as 0b01.
L3
Level 3 cache, encoded in the "Rt<2:1>" field as 0b10.
STRM
Streaming or non-temporal prefetch, for data that is used only once. Encoded in the "Rt<0>" field
as 1.
<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:
<amount> Is the index shift amount, optional only when <extend> is not LSL. Where it is permitted to be optional,
it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #3
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
Operation
if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);
if n == 31 then
address = SP[];
else
address = X[n];
Prefetch(address, t<4:0>);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Prefetch Memory (unscaled offset) signals the memory system that data memory accesses from a specified address are
likely to occur in the near future. The memory system can respond by taking actions that are expected to speed up the
memory accesses when they do occur, such as preloading the cache line containing the specified address into one or
more caches.
The effect of an PRFUM instruction is IMPLEMENTATION DEFINED. For more information, see Prefetch memory.
For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 1 0 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
PLI
Preload instructions, encoded in the "Rt<4:3>" field as 0b01.
PST
Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.
L2
Level 2 cache, encoded in the "Rt<2:1>" field as 0b01.
L3
Level 3 cache, encoded in the "Rt<2:1>" field as 0b10.
STRM
Streaming or non-temporal prefetch, for data that is used only once. Encoded in the "Rt<0>" field
as 1.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
bits(64) address;
if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);
if n == 31 then
address = SP[];
else
address = X[n];
Prefetch(address, t<4:0>);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Profiling Synchronization Barrier. This instruction is a barrier that ensures that all existing profiling data for the
current PE has been formatted, and profiling buffer addresses have been translated such that all writes to the profiling
buffer have been initiated. A following DSB instruction completes when the writes to the profiling buffer have
completed.
If the Statistical Profiling Extension is not implemented, this instruction executes as a NOP.
System
(FEAT_SPE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 1 1 1 1 1 1
CRm op2
PSB CSYNC
Operation
ProfilingSynchronizationBarrier();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Physical Speculative Store Bypass Barrier is a memory barrier which prevents speculative loads from bypassing
earlier stores to the same physical address.
The semantics of the Physical Speculative Store Bypass Barrier are:
• When a load to a location appears in program order after the PSSBB, then the load does not speculatively
read an entry earlier in the coherence order for that location than the entry generated by the latest store
satisfying all of the following conditions:
◦ The store is to the same location as the load.
◦ The store appears in program order before the PSSBB.
• When a load to a location appears in program order before the PSSBB, then the load does not speculatively
read data from any store satisfying all of the following conditions:
◦ The store is to the same location as the load.
◦ The store appears in program order after the PSSBB.
• The encodings in this description are named to match the encodings of DSB.
• The description of DSB gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 1 1
CRm opc
PSSBB
is equivalent to
DSB #4
Operation
The description of DSB gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
for i = 0 to datasize-1
result<(datasize-1)-i> = operand<i>;
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Return from subroutine branches unconditionally to an address in a register, with a hint that this is a subroutine
return.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 Rn 0 0 0 0 0
Z op A M Rm
RET {<Xn>}
integer n = UInt(Rn);
Assembler Symbols
<Xn> Is the 64-bit name of the general-purpose register holding the address to be branched to, encoded in
the "Rn" field. Defaults to X30 if absent.
Operation
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Return from subroutine, with pointer authentication. This instruction authenticates the address that is held in LR,
using SP as the modifier and the specified key, branches to the authenticated address, with a hint that this instruction
is a subroutine return.
Key A is used for RETAA, and key B is used for RETAB.
If the authentication passes, the PE continues execution at the target of the branch. If the authentication fails, a
Translation fault is generated.
The authenticated address is not written back to LR.
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 0 1 0 1 1 1 1 1 0 0 0 0 1 M 1 1 1 1 1 1 1 1 1 1
Z op A Rn Rm
RETAA (M == 0)
RETAA
RETAB (M == 1)
RETAB
if !HavePACExt() then
UNDEFINED;
Operation
if use_key_a then
target = AuthIA(target, modifier, TRUE);
else
target = AuthIB(target, modifier, TRUE);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
integer container_size;
case opc of
when '00'
Unreachable();
when '01'
container_size = 16;
when '10'
container_size = 32;
when '11'
if sf == '0' then UNDEFINED;
container_size = 64;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
X[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reverse bytes in 16-bit halfwords reverses the byte order in each 16-bit halfword of a register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 Rn Rd
opc
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer container_size;
case opc of
when '00'
Unreachable();
when '01'
container_size = 16;
when '10'
container_size = 32;
when '11'
if sf == '0' then UNDEFINED;
container_size = 64;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
X[d] = result;
Operational information
If PSTATE.DIT is 1:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reverse bytes in 32-bit words reverses the byte order in each 32-bit word of a register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 Rn Rd
sf opc
integer d = UInt(Rd);
integer n = UInt(Rn);
integer container_size;
case opc of
when '00'
Unreachable();
when '01'
container_size = 16;
when '10'
container_size = 32;
when '11'
if sf == '0' then UNDEFINED;
container_size = 64;
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
• The encodings in this description are named to match the encodings of REV.
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of REV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 Rn Rd
sf opc
64-bit
is equivalent to
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
The description of REV gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Performs a rotation right of a value held in a general purpose register by an immediate value, and then inserts a
selection of the bottom four bits of the result of the rotation into the PSTATE flags, under the control of a second
immediate mask.
Integer
(FEAT_FlagM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 1 0 0 0 0 imm6 0 0 0 0 1 Rn 0 mask
sf
Assembler Symbols
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<shift> Is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the "imm6" field,
<mask> Is the flag bit mask, an immediate in the range 0 to 15, which selects the bits that are inserted into the
NZCV condition flags, encoded in the "mask" field.
Operation
bits(4) tmp;
bits(64) tmpreg = X[n];
tmp = (tmpreg:tmpreg)<lsb+3:lsb>;
if mask<3> == '1' then PSTATE.N = tmp<3>;
if mask<2> == '1' then PSTATE.Z = tmp<2>;
if mask<1> == '1' then PSTATE.C = tmp<1>;
if mask<0> == '1' then PSTATE.V = tmp<0>;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Rotate right (immediate) provides the value of the contents of a register rotated by a variable number of bits. The bits
that are rotated off the right end are inserted into the vacated bit positions on the left.
• The encodings in this description are named to match the encodings of EXTR.
• The description of EXTR gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 1 N 0 Rm imms Rn Rd
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Ws> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xs> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<shift> For the 32-bit variant: is the amount by which to rotate, in the range 0 to 31, encoded in the "imms"
field.
For the 64-bit variant: is the amount by which to rotate, in the range 0 to 63, encoded in the "imms"
field.
Operation
The description of EXTR gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Rotate Right (register) provides the value of the contents of a register rotated by a variable number of bits. The bits
that are rotated off the right end are inserted into the vacated bit positions on the left. The remainder obtained by
dividing the second source register by the data size defines the number of bits by which the first source register is
right-shifted.
• The encodings in this description are named to match the encodings of RORV.
• The description of RORV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 1 1 Rn Rd
op2
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in
its bottom 5 bits, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in
its bottom 6 bits, encoded in the "Rm" field.
Operation
The description of RORV gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Rotate Right Variable provides the value of the contents of a register rotated by a variable number of bits. The bits
that are rotated off the right end are inserted into the vacated bit positions on the left. The remainder obtained by
dividing the second source register by the data size defines the number of bits by which the first source register is
right-shifted.
This instruction is used by the alias ROR (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 1 1 Rn Rd
op2
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
ShiftType shift_type = DecodeShift(op2);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in
its bottom 5 bits, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in
its bottom 6 bits, encoded in the "Rm" field.
Operation
bits(datasize) result;
bits(datasize) operand2 = X[m];
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SB
Operation
SpeculationBarrier();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SB Page 584
SBC
Subtract with Carry subtracts a register value and the value of NOT (Carry flag) from a register value, and writes the
result to the destination register.
This instruction is used by the alias NGC.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
Alias Conditions
Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
operand2 = NOT(operand2);
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract with Carry, setting flags, subtracts a register value and the value of NOT (Carry flag) from a register value,
and writes the result to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias NGCS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
Alias Conditions
Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
bits(4) nzcv;
operand2 = NOT(operand2);
PSTATE.<N,Z,C,V> = nzcv;
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Bitfield Insert in Zeros copies a bitfield of <width> bits from the least significant bits of the source register to
bit position <lsb> of the destination register, setting the destination bits below the bitfield to zero, and the bits above
the bitfield to a copy of the most significant bit of the bitfield.
• The encodings in this description are named to match the encodings of SBFM.
• The description of SBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 0 N immr imms Rn Rd
opc
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<lsb> For the 32-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 63.
<width> For the 32-bit variant: is the width of the bitfield, in the range 1 to 32-<lsb>.
For the 64-bit variant: is the width of the bitfield, in the range 1 to 64-<lsb>.
Operation
The description of SBFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Bitfield Move is usually accessed via one of its aliases, which are always preferred for disassembly.
If <imms> is greater than or equal to <immr>, this copies a bitfield of (<imms>-<immr>+1) bits starting from bit
position <immr> in the source register to the least significant bits of the destination register.
If <imms> is less than <immr>, this copies a bitfield of (<imms>+1) bits from the least significant bits of the source
register to bit position (regsize-<immr>) of the destination register, where regsize is the destination register size of 32
or 64 bits.
In both cases the destination bits below the bitfield are set to zero, and the bits above the bitfield are set to a copy of
the most significant bit of the bitfield.
This instruction is used by the aliases ASR (immediate), SBFIZ, SBFX, SXTB, SXTH, and SXTW.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 0 N immr imms Rn Rd
opc
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
integer R;
integer S;
bits(datasize) wmask;
bits(datasize) tmask;
R = UInt(immr);
S = UInt(imms);
(wmask, tmask) = DecodeBitMasks(N, imms, immr, FALSE);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<immr> For the 32-bit variant: is the right rotate amount, in the range 0 to 31, encoded in the "immr" field.
For the 64-bit variant: is the right rotate amount, in the range 0 to 63, encoded in the "immr" field.
<imms> For the 32-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 31,
encoded in the "imms" field.
For the 64-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 63,
encoded in the "imms" field.
Alias Conditions
Operation
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Bitfield Extract copies a bitfield of <width> bits starting from bit position <lsb> in the source register to the
least significant bits of the destination register, and sets destination bits above the bitfield to a copy of the most
significant bit of the bitfield.
• The encodings in this description are named to match the encodings of SBFM.
• The description of SBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 0 N immr imms Rn Rd
opc
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<lsb> For the 32-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 63.
<width> For the 32-bit variant: is the width of the bitfield, in the range 1 to 32-<lsb>.
For the 64-bit variant: is the width of the bitfield, in the range 1 to 64-<lsb>.
Operation
The description of SBFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Divide divides a signed integer register value by another signed integer register value, and writes the result to
the destination register. The condition flags are not affected.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 0 0 1 1 Rn Rd
o1
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
Operation
if IsZero(operand2) then
result = 0;
else
result = RoundTowardsZero(Real(Int(operand1, FALSE)) / Real(Int(operand2, FALSE)));
X[d] = result<datasize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Set the PSTATE.NZV flags based on the value in the specified general-purpose register. SETF8 treats the value as an 8
bit value, and SETF16 treats the value as an 16 bit value.
The PSTATE.C flag is not affected by these instructions.
Integer
(FEAT_FlagM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 sz 0 0 1 0 Rn 0 1 1 0 1
sf
SETF8 (sz == 0)
SETF8 <Wn>
SETF16 (sz == 1)
SETF16 <Wn>
Assembler Symbols
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Set with tag setting. These instructions perform a memory set using the value in the bottom byte of the
source register and store an Allocation Tag to memory for each Tag Granule written. The Allocation Tag is calculated
from the Logical Address Tag in the register which holds the first address that the set is made to. The prologue, main,
and epilogue instructions are expected to be run in succession and to appear consecutively in memory: SETGP, then
SETGM, and then SETGE.
SETGP performs some preconditioning of the arguments suitable for using the SETGM instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory set. SETGM performs an IMPLEMENTATION DEFINED amount of the
memory set. SETGE performs the last part of the memory set.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory set allows some optimization of the size that can be
performed.
The architecture supports two algorithms for the memory set: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of SETGP, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETGP, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETGM, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* number of bytes remaining to be set in
the memory set in total.
For SETGM, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETGE, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETGE, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.
Integer
(FEAT_MOPS)
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;
MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;
if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address (an integer multiple of 16) and for option B is updated by the
instruction, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the destination address (an integer multiple of 16) and is updated by the instruction, encoded in the
"Rd" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be set (an integer multiple of 16) and is set to zero at the end of the instruction,
encoded in the "Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be set (an integer multiple of 16) and is updated by the instruction, encoded in the
"Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set (an integer multiple of 16) and is updated by the instruction, encoded in the "Rn" field.
<Xs> For the epilogue variant: is the 64-bit name of the general-purpose register that holds the source data,
encoded in the "Rs" field.
For the main and prologue variant: is the 64-bit name of the general-purpose register that holds the
source data in bits<7:0>, encoded in the "Rs" field.
if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);
if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
integer tagstep;
bits(4) tag;
bits(64) tagaddr;
if supports_option_a then
while SInt(stagesetsize) < 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 16);
assert B <= -1 * SInt(stagesetsize);
assert B<3:0> == '0000';
setsize = setsize + B;
stagesetsize = stagesetsize + B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
else
while UInt(stagesetsize) > 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 16);
assert B <= UInt(stagesetsize);
assert B<3:0> == '0000';
toaddress = toaddress + B;
setsize = setsize - B;
stagesetsize = stagesetsize - B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
X[d] = toaddress;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Set with tag setting, non-temporal. These instructions perform a memory set using the value in the bottom
byte of the source register and store an Allocation Tag to memory for each Tag Granule written. The Allocation Tag is
calculated from the Logical Address Tag in the register which holds the first address that the set is made to. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: SETGPN, then SETGMN, and then SETGEN.
SETGPN performs some preconditioning of the arguments suitable for using the SETGMN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory set. SETGMN performs an IMPLEMENTATION DEFINED amount of the
memory set. SETGEN performs the last part of the memory set.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory set allows some optimization of the size that can be
performed.
The architecture supports two algorithms for the memory set: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of SETGPN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETGPN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETGMN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* number of bytes remaining to be set in
the memory set in total.
For SETGMN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETGEN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETGEN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.
Integer
(FEAT_MOPS)
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;
MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;
if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address (an integer multiple of 16) and for option B is updated by the
instruction, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the destination address (an integer multiple of 16) and is updated by the instruction, encoded in the
"Rd" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be set (an integer multiple of 16) and is set to zero at the end of the instruction,
encoded in the "Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be set (an integer multiple of 16) and is updated by the instruction, encoded in the
"Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set (an integer multiple of 16) and is updated by the instruction, encoded in the "Rn" field.
<Xs> For the epilogue variant: is the 64-bit name of the general-purpose register that holds the source data,
encoded in the "Rs" field.
For the main and prologue variant: is the 64-bit name of the general-purpose register that holds the
source data in bits<7:0>, encoded in the "Rs" field.
if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);
if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
integer tagstep;
bits(4) tag;
bits(64) tagaddr;
if supports_option_a then
while SInt(stagesetsize) < 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 16);
assert B <= -1 * SInt(stagesetsize);
assert B<3:0> == '0000';
setsize = setsize + B;
stagesetsize = stagesetsize + B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
else
while UInt(stagesetsize) > 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 16);
assert B <= UInt(stagesetsize);
assert B<3:0> == '0000';
toaddress = toaddress + B;
setsize = setsize - B;
stagesetsize = stagesetsize - B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
X[d] = toaddress;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Set with tag setting, unprivileged. These instructions perform a memory set using the value in the bottom
byte of the source register and store an Allocation Tag to memory for each Tag Granule written. The Allocation Tag is
calculated from the Logical Address Tag in the register which holds the first address that the set is made to. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: SETGPT, then SETGMT, and then SETGET.
SETGPT performs some preconditioning of the arguments suitable for using the SETGMT instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory set. SETGMT performs an IMPLEMENTATION DEFINED amount of the
memory set. SETGET performs the last part of the memory set.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory set allows some optimization of the size that can be
performed.
The architecture supports two algorithms for the memory set: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of SETGPT, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETGPT, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETGMT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* number of bytes remaining to be set in
the memory set in total.
For SETGMT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETGET, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETGET, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.
Integer
(FEAT_MOPS)
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;
MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;
if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address (an integer multiple of 16) and for option B is updated by the
instruction, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the destination address (an integer multiple of 16) and is updated by the instruction, encoded in the
"Rd" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be set (an integer multiple of 16) and is set to zero at the end of the instruction,
encoded in the "Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be set (an integer multiple of 16) and is updated by the instruction, encoded in the
"Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set (an integer multiple of 16) and is updated by the instruction, encoded in the "Rn" field.
<Xs> For the epilogue variant: is the 64-bit name of the general-purpose register that holds the source data,
encoded in the "Rs" field.
For the main and prologue variant: is the 64-bit name of the general-purpose register that holds the
source data in bits<7:0>, encoded in the "Rs" field.
if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);
if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
integer tagstep;
bits(4) tag;
bits(64) tagaddr;
if supports_option_a then
while SInt(stagesetsize) < 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 16);
assert B <= -1 * SInt(stagesetsize);
assert B<3:0> == '0000';
setsize = setsize + B;
stagesetsize = stagesetsize + B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
else
while UInt(stagesetsize) > 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 16);
assert B <= UInt(stagesetsize);
assert B<3:0> == '0000';
toaddress = toaddress + B;
setsize = setsize - B;
stagesetsize = stagesetsize - B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
X[d] = toaddress;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Set with tag setting, unprivileged and non-temporal. These instructions perform a memory set using the value
in the bottom byte of the source register and store an Allocation Tag to memory for each Tag Granule written. The
Allocation Tag is calculated from the Logical Address Tag in the register which holds the first address that the set is
made to. The prologue, main, and epilogue instructions are expected to be run in succession and to appear
consecutively in memory: SETGPTN, then SETGMTN, and then SETGETN.
SETGPTN performs some preconditioning of the arguments suitable for using the SETGMTN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory set. SETGMTN performs an IMPLEMENTATION DEFINED amount of the
memory set. SETGETN performs the last part of the memory set.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory set allows some optimization of the size that can be
performed.
The architecture supports two algorithms for the memory set: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of SETGPTN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETGPTN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETGMTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* number of bytes remaining to be set in
the memory set in total.
For SETGMTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETGETN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETGETN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.
Integer
(FEAT_MOPS)
SETGPTN, SETGMTN,
Page 615
SETGETN
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 1 1 0 Rs x x 1 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;
MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;
if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address (an integer multiple of 16) and for option B is updated by the
instruction, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the destination address (an integer multiple of 16) and is updated by the instruction, encoded in the
"Rd" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be set (an integer multiple of 16) and is set to zero at the end of the instruction,
encoded in the "Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be set (an integer multiple of 16) and is updated by the instruction, encoded in the
"Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set (an integer multiple of 16) and is updated by the instruction, encoded in the "Rn" field.
<Xs> For the epilogue variant: is the 64-bit name of the general-purpose register that holds the source data,
encoded in the "Rs" field.
For the main and prologue variant: is the 64-bit name of the general-purpose register that holds the
source data in bits<7:0>, encoded in the "Rs" field.
SETGPTN, SETGMTN,
Page 616
SETGETN
Operation
SETGPTN, SETGMTN,
Page 617
SETGETN
CheckMOPSEnabled();
if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);
if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
SETGPTN, SETGMTN,
Page 618
SETGETN
boolean from_epilogue = FALSE;
MismatchedMemSetException(supports_option_a, d, s, n, wrong_option, from_epilogue, options, i
else
stagesetsize = postsize;
if (setsize != postsize || MemSetParametersIllformedE(toaddress, setsize, is_setg)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemSetException(supports_option_a, d, s, n, wrong_option, from_epilogue, options, i
integer tagstep;
bits(4) tag;
bits(64) tagaddr;
if supports_option_a then
while SInt(stagesetsize) < 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 16);
assert B <= -1 * SInt(stagesetsize);
assert B<3:0> == '0000';
setsize = setsize + B;
stagesetsize = stagesetsize + B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
else
while UInt(stagesetsize) > 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 16);
assert B <= UInt(stagesetsize);
assert B<3:0> == '0000';
toaddress = toaddress + B;
setsize = setsize - B;
stagesetsize = stagesetsize - B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
X[d] = toaddress;
SETGPTN, SETGMTN,
Page 619
SETGETN
X[n] = setsize;
X[d] = toaddress;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SETGPTN, SETGMTN,
Page 620
SETGETN
SETP, SETM, SETE
Memory Set. These instructions perform a memory set using the value in the bottom byte of the source register. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: SETP, then SETM, and then SETE.
SETP performs some preconditioning of the arguments suitable for using the SETM instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory set. SETM performs an IMPLEMENTATION DEFINED amount of the
memory set. SETE performs the last part of the memory set.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory set allows some optimization of the size that can be
performed.
The architecture supports two algorithms for the memory set: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of SETP, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETP, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETM, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be set
in the memory set in total.
For SETM, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETE, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETE, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 1 1 0 Rs x x 0 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;
MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;
if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address and for option B is updated by the instruction, encoded in the "Rd"
field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set and is set to zero at the end of the instruction, encoded in the "Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be set and is updated by the instruction, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set and is updated by the instruction, encoded in the "Rn" field.
<Xs> Is the 64-bit name of the general-purpose register that holds the source data, encoded in the "Rs" field.
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
if supports_option_a then
while SInt(stagesetsize) < 0 do
// IMP DEF selection of the block size that is worked on. While many
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Set, non-temporal. These instructions perform a memory set using the value in the bottom byte of the source
register. The prologue, main, and epilogue instructions are expected to be run in succession and to appear
consecutively in memory: SETPN, then SETMN, and then SETEN.
SETPN performs some preconditioning of the arguments suitable for using the SETMN instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory set. SETMN performs an IMPLEMENTATION DEFINED amount of the
memory set. SETEN performs the last part of the memory set.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory set allows some optimization of the size that can be
performed.
The architecture supports two algorithms for the memory set: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of SETPN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETPN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETMN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be set
in the memory set in total.
For SETMN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETEN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETEN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 1 1 0 Rs x x 1 0 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;
MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;
if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address and for option B is updated by the instruction, encoded in the "Rd"
field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set and is set to zero at the end of the instruction, encoded in the "Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be set and is updated by the instruction, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set and is updated by the instruction, encoded in the "Rn" field.
<Xs> Is the 64-bit name of the general-purpose register that holds the source data, encoded in the "Rs" field.
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
if supports_option_a then
while SInt(stagesetsize) < 0 do
// IMP DEF selection of the block size that is worked on. While many
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Set, unprivileged. These instructions perform a memory set using the value in the bottom byte of the source
register. The prologue, main, and epilogue instructions are expected to be run in succession and to appear
consecutively in memory: SETPT, then SETMT, and then SETET.
SETPT performs some preconditioning of the arguments suitable for using the SETMT instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory set. SETMT performs an IMPLEMENTATION DEFINED amount of the
memory set. SETET performs the last part of the memory set.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory set allows some optimization of the size that can be
performed.
The architecture supports two algorithms for the memory set: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of SETPT, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETPT, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETMT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be set
in the memory set in total.
For SETMT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETET, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETET, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 1 1 0 Rs x x 0 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;
MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;
if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address and for option B is updated by the instruction, encoded in the "Rd"
field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set and is set to zero at the end of the instruction, encoded in the "Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be set and is updated by the instruction, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set and is updated by the instruction, encoded in the "Rn" field.
<Xs> Is the 64-bit name of the general-purpose register that holds the source data, encoded in the "Rs" field.
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
if supports_option_a then
while SInt(stagesetsize) < 0 do
// IMP DEF selection of the block size that is worked on. While many
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Memory Set, unprivileged and non-temporal. These instructions perform a memory set using the value in the bottom
byte of the source register. The prologue, main, and epilogue instructions are expected to be run in succession and to
appear consecutively in memory: SETPTN, then SETMTN, and then SETETN.
SETPTN performs some preconditioning of the arguments suitable for using the SETMTN instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory set. SETMTN performs an IMPLEMENTATION DEFINED amount of the
memory set. SETETN performs the last part of the memory set.
Note
The inclusion of IMPLEMENTATION DEFINED amounts of memory set allows some optimization of the size that can be
performed.
The architecture supports two algorithms for the memory set: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.
Note
Portable software should not assume that the choice of algorithm is constant.
After execution of SETPTN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETPTN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETMTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be set
in the memory set in total.
For SETMTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETETN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETETN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.
Integer
(FEAT_MOPS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 1 1 0 Rs x x 1 1 0 1 Rn Rd
op2
integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;
MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;
if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;
Assembler Symbols
<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address and for option B is updated by the instruction, encoded in the "Rd"
field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set and is set to zero at the end of the instruction, encoded in the "Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be set and is updated by the instruction, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set and is updated by the instruction, encoded in the "Rn" field.
<Xs> Is the 64-bit name of the general-purpose register that holds the source data, encoded in the "Rs" field.
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';
if supports_option_a then
while SInt(stagesetsize) < 0 do
// IMP DEF selection of the block size that is worked on. While many
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Send Event is a hint instruction. It causes an event to be signaled to all PEs in the multiprocessor system. For more
information, see Wait for Event mechanism and Send event.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 1 1 1 1 1
CRm op2
SEV
// Empty.
Operation
SendEvent();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Send Event Local is a hint instruction that causes an event to be signaled locally without requiring the event to be
signaled to other PEs in the multiprocessor system. It can prime a wait-loop which starts with a WFE instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 1 1 1 1 1
CRm op2
SEVL
// Empty.
Operation
SendEventLocal();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Multiply-Add Long multiplies two 32-bit register values, adds a 64-bit register value, and writes the result to
the 64-bit destination register.
This instruction is used by the alias SMULL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 0 1 Rm 0 Ra Rn Rd
U o0
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xa> Is the 64-bit name of the third general-purpose source register holding the addend, encoded in the "Ra"
field.
Alias Conditions
Operation
integer result;
X[d] = result<63:0>;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SMC #<imm>
// Empty.
Assembler Symbols
<imm> Is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.
Operation
AArch64.CheckForSMCUndefOrTrap(imm16);
AArch64.CallSecureMonitor(imm16);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Multiply-Negate Long multiplies two 32-bit register values, negates the product, and writes the result to the
64-bit destination register.
• The encodings in this description are named to match the encodings of SMSUBL.
• The description of SMSUBL gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 0 1 Rm 1 1 1 1 1 1 Rn Rd
U o0 Ra
is equivalent to
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
Operation
The description of SMSUBL gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Multiply-Subtract Long multiplies two 32-bit register values, subtracts the product from a 64-bit register value,
and writes the result to the 64-bit destination register.
This instruction is used by the alias SMNEGL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 0 1 Rm 1 Ra Rn Rd
U o0
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xa> Is the 64-bit name of the third general-purpose source register holding the minuend, encoded in the
"Ra" field.
Alias Conditions
Operation
integer result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Multiply High multiplies two 64-bit register values, and writes bits[127:64] of the 128-bit result to the 64-bit
destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 1 0 Rm 0 (1) (1) (1) (1) (1) Rn Rd
U Ra
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
Operation
integer result;
X[d] = result<127:64>;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Multiply Long multiplies two 32-bit register values, and writes the result to the 64-bit destination register.
• The encodings in this description are named to match the encodings of SMADDL.
• The description of SMADDL gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 0 1 Rm 0 1 1 1 1 1 Rn Rd
U o0 Ra
is equivalent to
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
Operation
The description of SMADDL gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Speculative Store Bypass Barrier is a memory barrier which prevents speculative loads from bypassing earlier stores
to the same virtual address under certain conditions.
The semantics of the Speculative Store Bypass Barrier are:
• When a load to a location appears in program order after the SSBB, then the load does not speculatively read
an entry earlier in the coherence order for that location than the entry generated by the latest store satisfying
all of the following conditions:
◦ The store is to the same location as the load.
◦ The store uses the same virtual address as the load.
◦ The store appears in program order before the SSBB.
• When a load to a location appears in program order before the SSBB, then the load does not speculatively
read data from any store satisfying all of the following conditions:
◦ The store is to the same location as the load.
◦ The store uses the same virtual address as the load.
◦ The store appears in program order after the SSBB.
• The encodings in this description are named to match the encodings of DSB.
• The description of DSB gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 1 0 0 1 1 1 1 1
CRm opc
SSBB
is equivalent to
DSB #0
Operation
The description of DSB gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Allocation Tags stores an Allocation Tag to two Tag granules of memory. The address used for the store is
calculated from the base register and an immediate signed offset scaled by the Tag granule. The Allocation Tag is
calculated from the Logical Address Tag in the source register.
This instruction generates an Unchecked access.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset
Post-index
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 0 1 imm9 0 1 Xn Xt
Pre-index
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 0 1 imm9 1 1 Xn Xt
Signed offset
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 0 1 imm9 1 0 Xn Xt
Assembler Symbols
<Xt|SP> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Xt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Xn" field.
<simm> Is the optional signed immediate offset, a multiple of 16 in the range -4096 to 4080, defaulting to 0 and
encoded in the "imm9" field.
bits(64) address;
bits(64) data = if t == 31 then SP[] else X[t];
bits(4) tag = AArch64.AllocationTagFromAddress(data);
SetTagCheckedInstruction(FALSE);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
if writeback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Single-copy Atomic 64-byte Store without Return stores eight 64-bit doublewords from consecutive registers, Xt to
X(t+7), to a memory location. The data that is stored is atomic and is required to be 64-byte-aligned.
Integer
(FEAT_LS64)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 0 0 1 0 0 Rn Rt
integer n = UInt(Rn);
integer t = UInt(Rt);
boolean tag_checked = n != 31;
Assembler Symbols
<Xt> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
CheckLDST64BEnabled();
bits(512) data;
bits(64) address;
bits(64) value;
acctype = AccType_ATOMICLS64;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
for i = 0 to 7
value = X[t+i];
if BigEndian(acctype) then value = BigEndianReverse(value);
data<63+64*i:64*i> = value;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Single-copy Atomic 64-byte Store with Return stores eight 64-bit doublewords from consecutive registers, Xt to
X(t+7), to a memory location, and writes the status result of the store to a register. The data that is stored is atomic
and is required to be 64-byte aligned.
Integer
(FEAT_LS64_V)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 0 0 1 Rs 1 0 1 1 0 0 Rn Rt
integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs);
boolean tag_checked = n != 31;
Assembler Symbols
<Xs> Is the 64-bit name of the general-purpose register into which the status result of this instruction is
written, encoded in the "Rs" field.
The value returned is:
0xFFFFFFFF_FFFFFFFF
If the memory location accessed does not support this instruction. In this case, the value at the
memory location is UNKNOWN.
!= 0xFFFFFFFF_FFFFFFFF
If the memory location accessed does support this instruction. In this case, the peripheral that
provides the response defines the returned value and provides information on the state of the
memory update at the memory location.
CheckST64BVEnabled();
bits(512) data;
bits(64) address;
bits(64) value;
bits(64) status;
acctype = AccType_ATOMICLS64;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
for i = 0 to 7
value = X[t+i];
if BigEndian(acctype) then value = BigEndianReverse(value);
data<63+64*i:64*i> = value;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Single-copy Atomic 64-byte EL0 Store with Return stores eight 64-bit doublewords from consecutive registers, Xt to
X(t+7), to a memory location, with the bottom 32 bits taken from ACCDATA_EL1, and writes the status result of the
store to a register. The data that is stored is atomic and is required to be 64-byte aligned.
Integer
(FEAT_LS64_ACCDATA)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 0 0 1 Rs 1 0 1 0 0 0 Rn Rt
integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs);
boolean tag_checked = n != 31;
Assembler Symbols
<Xs> Is the 64-bit name of the general-purpose register into which the status result of this instruction is
written, encoded in the "Rs" field.
The value returned is:
0xFFFFFFFF_FFFFFFFF
If the memory location accessed does not support this instruction. In this case, the value at the
memory location is UNKNOWN.
!= 0xFFFFFFFF_FFFFFFFF
If the memory location accessed does support this instruction. In this case, the peripheral that
provides the response defines the returned value and provides information on the state of the
memory update at the memory location.
CheckST64BV0Enabled();
bits(512) data;
bits(64) address;
bits(64) value;
bits(64) status;
acctype = AccType_ATOMICLS64;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
bits(64) Xt = X[t];
value<31:0> = ACCDATA_EL1<31:0>;
value<63:32> = Xt<63:32>;
if BigEndian(acctype) then value = BigEndianReverse(value);
data<63:0> = value;
for i = 1 to 7
value = X[t+i];
if BigEndian(acctype) then value = BigEndianReverse(value);
data<63+64*i:64*i> = value;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic add on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword
from memory, adds the value held in a register to it, and stores the result back to memory.
• STADD does not have release semantics.
• STADDL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDADD, LDADDA, LDADDAL,
LDADDL.
• The description of LDADD, LDADDA, LDADDAL, LDADDL gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 0 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt
is equivalent to
is equivalent to
is equivalent to
is equivalent to
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDADD, LDADDA, LDADDAL, LDADDL gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic add on byte in memory, without return, atomically loads an 8-bit byte from memory, adds the value held in a
register to it, and stores the result back to memory.
• STADDB does not have release semantics.
• STADDLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDADDB, LDADDAB, LDADDALB,
LDADDLB.
• The description of LDADDB, LDADDAB, LDADDALB, LDADDLB gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 0 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDADDB, LDADDAB, LDADDALB, LDADDLB gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic add on halfword in memory, without return, atomically loads a 16-bit halfword from memory, adds the value
held in a register to it, and stores the result back to memory.
• STADDH does not have release semantics.
• STADDLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDADDH, LDADDAH, LDADDALH,
LDADDLH.
• The description of LDADDH, LDADDAH, LDADDALH, LDADDLH gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 0 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDADDH, LDADDAH, LDADDALH, LDADDLH gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic bit clear on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, performs a bitwise AND with the complement of the value held in a register on it, and
stores the result back to memory.
• STCLR does not have release semantics.
• STCLRL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDCLR, LDCLRA, LDCLRAL, LDCLRL.
• The description of LDCLR, LDCLRA, LDCLRAL, LDCLRL gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 0 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt
is equivalent to
is equivalent to
is equivalent to
is equivalent to
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDCLR, LDCLRA, LDCLRAL, LDCLRL gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic bit clear on byte in memory, without return, atomically loads an 8-bit byte from memory, performs a bitwise
AND with the complement of the value held in a register on it, and stores the result back to memory.
• STCLRB does not have release semantics.
• STCLRLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDCLRB, LDCLRAB, LDCLRALB,
LDCLRLB.
• The description of LDCLRB, LDCLRAB, LDCLRALB, LDCLRLB gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 0 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDCLRB, LDCLRAB, LDCLRALB, LDCLRLB gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic bit clear on halfword in memory, without return, atomically loads a 16-bit halfword from memory, performs a
bitwise AND with the complement of the value held in a register on it, and stores the result back to memory.
• STCLRH does not have release semantics.
• STCLRLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDCLRH, LDCLRAH, LDCLRALH,
LDCLRLH.
• The description of LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 0 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic exclusive OR on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, performs an exclusive OR with the value held in a register on it, and stores the result back
to memory.
• STEOR does not have release semantics.
• STEORL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDEOR, LDEORA, LDEORAL,
LDEORL.
• The description of LDEOR, LDEORA, LDEORAL, LDEORL gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 0 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt
is equivalent to
is equivalent to
is equivalent to
is equivalent to
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDEOR, LDEORA, LDEORAL, LDEORL gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic exclusive OR on byte in memory, without return, atomically loads an 8-bit byte from memory, performs an
exclusive OR with the value held in a register on it, and stores the result back to memory.
• STEORB does not have release semantics.
• STEORLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDEORB, LDEORAB, LDEORALB,
LDEORLB.
• The description of LDEORB, LDEORAB, LDEORALB, LDEORLB gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 0 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDEORB, LDEORAB, LDEORALB, LDEORLB gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic exclusive OR on halfword in memory, without return, atomically loads a 16-bit halfword from memory, performs
an exclusive OR with the value held in a register on it, and stores the result back to memory.
• STEORH does not have release semantics.
• STEORLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDEORH, LDEORAH, LDEORALH,
LDEORLH.
• The description of LDEORH, LDEORAH, LDEORALH, LDEORLH gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 0 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDEORH, LDEORAH, LDEORALH, LDEORLH gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Allocation Tag stores an Allocation Tag to memory. The address used for the store is calculated from the base
register and an immediate signed offset scaled by the Tag granule. The Allocation Tag is calculated from the Logical
Address Tag in the source register.
This instruction generates an Unchecked access.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset
Post-index
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 0 1 imm9 0 1 Xn Xt
Pre-index
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 0 1 imm9 1 1 Xn Xt
Signed offset
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 0 1 imm9 1 0 Xn Xt
Assembler Symbols
<Xt|SP> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Xt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Xn" field.
<simm> Is the optional signed immediate offset, a multiple of 16 in the range -4096 to 4080, defaulting to 0 and
encoded in the "imm9" field.
bits(64) address;
SetTagCheckedInstruction(FALSE);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
if writeback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Tag Multiple writes a naturally aligned block of N Allocation Tags, where the size of N is identified in
GMID_EL1.BS, and the Allocation Tag written to address A is taken from the source register at
4*A<7:4>+3:4*A<7:4>.
This instruction is UNDEFINED at EL0.
This instruction generates an Unchecked access.
Integer
(FEAT_MTE2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 Xn Xt
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Xt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Xn" field.
Operation
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
for i = 0 to count-1
bits(4) tag = data<(index*4)+3:index*4>;
AArch64.MemTag[address, AccType_NORMAL] = tag;
address = address + TAG_GRANULE;
index = index + 1;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Allocation Tag and Pair of registers stores an Allocation Tag and two 64-bit doublewords to memory, from two
registers. The address used for the store is calculated from the base register and an immediate signed offset scaled by
the Tag granule. The Allocation Tag is calculated from the Logical Address Tag in the base register.
This instruction generates an Unchecked access.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset
Post-index
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 0 1 0 simm7 Xt2 Xn Xt
Pre-index
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 1 1 0 simm7 Xt2 Xn Xt
Signed offset
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 1 0 0 simm7 Xt2 Xn Xt
Assembler Symbols
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Xt" field.
Operation
bits(64) address;
bits(64) data1;
bits(64) data2;
SetTagCheckedInstruction(FALSE);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data1 = X[t];
data2 = X[t2];
if !postindex then
address = address + offset;
if writeback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store LORelease Register stores a 32-bit word or a 64-bit doubleword to a memory location, from a register. The
instruction also has memory ordering semantics as described in Load LOAcquire, Store LORelease. For information
about memory accesses, see Load/Store addressing modes.
No offset
(FEAT_LOR)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, dbytes, AccType_LIMITEDORDERED] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store LORelease Register Byte stores a byte from a 32-bit register to a memory location. The instruction also has
memory ordering semantics as described in Load LOAcquire, Store LORelease. For information about memory
accesses, see Load/Store addressing modes.
No offset
(FEAT_LOR)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 1, AccType_LIMITEDORDERED] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store LORelease Register Halfword stores a halfword from a 32-bit register to a memory location. The instruction also
has memory ordering semantics as described in Load LOAcquire, Store LORelease. For information about memory
accesses, see Load/Store addressing modes.
No offset
(FEAT_LOR)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 2, AccType_LIMITEDORDERED] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store-Release Register stores a 32-bit word or a 64-bit doubleword to a memory location, from a register. The
instruction also has memory ordering semantics as described in Load-Acquire, Store-Release. For information about
memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, dbytes, AccType_ORDERED] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store-Release Register Byte stores a byte from a 32-bit register to a memory location. The instruction also has memory
ordering semantics as described in Load-Acquire, Store-Release. For information about memory accesses, see Load/
Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 1, AccType_ORDERED] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store-Release Register Halfword stores a halfword from a 32-bit register to a memory location. The instruction also
has memory ordering semantics as described in Load-Acquire, Store-Release. For information about memory accesses,
see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 2, AccType_ORDERED] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store-Release Register (unscaled) calculates an address from a base register value and an immediate offset, and
stores a 32-bit word or a 64-bit doubleword to the calculated address, from a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release
For information about memory accesses, see Load/Store addressing modes.
Unscaled offset
(FEAT_LRCPC2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 1 1 0 0 1 0 0 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, datasize DIV 8, AccType_ORDERED] = data;
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store-Release Register Byte (unscaled) calculates an address from a base register value and an immediate offset, and
stores a byte to the calculated address, from a 32-bit register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release
For information about memory accesses, see Load/Store addressing modes.
Unscaled offset
(FEAT_LRCPC2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 0 0 1 0 0 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 1, AccType_ORDERED] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store-Release Register Halfword (unscaled) calculates an address from a base register value and an immediate offset,
and stores a halfword to the calculated address, from a 32-bit register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release
For information about memory accesses, see Load/Store addressing modes.
Unscaled offset
(FEAT_LRCPC2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 0 0 1 0 0 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 2, AccType_ORDERED] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store-Release Exclusive Pair of registers stores two 32-bit words or two 64-bit doublewords to a memory location if the
PE has exclusive access to the memory address, from two registers, and returns a status value of 0 if the store was
successful, or of 1 if no store was performed. See Synchronization and semaphores. For information on single-copy
atomicity and alignment requirements, see Requirements for single-copy atomicity and Alignment of data accesses. If
a 64-bit pair Store-Exclusive succeeds, it causes a single-copy atomic update of the 128-bit memory location being
updated. The instruction also has memory ordering semantics, as described in Load-Acquire, Store-Release. For
information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 sz 0 0 1 0 0 0 0 0 1 Rs 1 Rt2 Rn Rt
L o0
32-bit (sz == 0)
64-bit (sz == 1)
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2); // ignored by load/store single register
integer s = UInt(Rs); // ignored by all loads and store-release
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STLXP.
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is
written, encoded in the "Rs" field. The value returned is:
0
If the operation updates memory.
1
If the operation fails to update memory.
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
Operation
bits(64) address;
bits(datasize) data;
constant integer dbytes = datasize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];
if rt_unknown then
data = bits(datasize) UNKNOWN;
else
bits(datasize DIV 2) el1 = X[t];
bits(datasize DIV 2) el2 = X[t2];
data = if BigEndian(AccType_ORDEREDATOMIC) then el1:el2 else el2:el1;
bit status = '1';
// Check whether the Exclusives monitors are set to include the
// physical memory locations corresponding to virtual address
// range [address, address+dbytes-1].
if AArch64.ExclusiveMonitorsPass(address, dbytes) then
// This atomic write will be rejected if it does not refer
// to the same physical locations after address translation.
Mem[address, dbytes, AccType_ORDEREDATOMIC] = data;
status = ExclusiveMonitorsStatus();
X[s] = ZeroExtend(status, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store-Release Exclusive Register stores a 32-bit word or a 64-bit doubleword to memory if the PE has exclusive access
to the memory address, from two registers, and returns a status value of 0 if the store was successful, or of 1 if no
store was performed. See Synchronization and semaphores. The memory access is atomic. The instruction also has
memory ordering semantics as described in Load-Acquire, Store-Release. For information about memory accesses see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 0 0 0 Rs 1 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs); // ignored by all loads and store-release
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STLXR.
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is
written, encoded in the "Rs" field. The value returned is:
0
If the operation updates memory.
1
If the operation fails to update memory.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Aborts and alignment
If a synchronous Data Abort exception is generated by the execution of this instruction:
• Memory is not updated.
Operation
bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];
if rt_unknown then
data = bits(elsize) UNKNOWN;
else
data = X[t];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store-Release Exclusive Register Byte stores a byte from a 32-bit register to memory if the PE has exclusive access to
the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See
Synchronization and semaphores. The memory access is atomic. The instruction also has memory ordering semantics
as described in Load-Acquire, Store-Release. For information about memory accesses see Load/Store addressing
modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 0 0 0 Rs 1 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs); // ignored by all loads and store-release
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STLXRB.
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is
written, encoded in the "Rs" field. The value returned is:
0
If the operation updates memory.
1
If the operation fails to update memory.
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Aborts
If a synchronous Data Abort exception is generated by the execution of this instruction:
• Memory is not updated.
• <Ws> is not updated.
If AArch64.ExclusiveMonitorsPass() returns FALSE and the memory address, if accessed, would generate a
synchronous Data Abort exception, it is IMPLEMENTATION DEFINED whether the exception is generated.
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];
if rt_unknown then
data = bits(8) UNKNOWN;
else
data = X[t];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store-Release Exclusive Register Halfword stores a halfword from a 32-bit register to memory if the PE has exclusive
access to the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was
performed. See Synchronization and semaphores. The memory access is atomic. The instruction also has memory
ordering semantics as described in Load-Acquire, Store-Release. For information about memory accesses see Load/
Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 0 0 0 Rs 1 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs); // ignored by all loads and store-release
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STLXRH.
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is
written, encoded in the "Rs" field. The value returned is:
0
If the operation updates memory.
1
If the operation fails to update memory.
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Aborts and alignment
If a synchronous Data Abort exception is generated by the execution of this instruction:
• Memory is not updated.
• <Ws> is not updated.
A non halfword-aligned memory address causes an Alignment fault Data Abort exception to be generated, subject to
the following rules:
• If AArch64.ExclusiveMonitorsPass() returns TRUE, the exception is generated.
• Otherwise, it is IMPLEMENTATION DEFINED whether the exception is generated.
If AArch64.ExclusiveMonitorsPass() returns FALSE and the memory address, if accessed, would generate a
synchronous Data Abort exception, it is IMPLEMENTATION DEFINED whether the exception is generated.
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];
if rt_unknown then
data = bits(16) UNKNOWN;
else
data = X[t];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Pair of Registers, with non-temporal hint, calculates an address from a base register value and an immediate
offset, and stores two 32-bit words or two 64-bit doublewords to the calculated address, from two registers. For
information about memory accesses, see Load/Store addressing modes. For information about Non-temporal pair
instructions, see Load/Store Non-temporal pair.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 0 0 0 imm7 Rt2 Rn Rt
opc L
// Empty.
Assembler Symbols
<Wt1> Is the 32-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Wt2> Is the 32-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the 32-bit variant: is the optional signed immediate byte offset, a multiple of 4 in the range -256 to
252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit variant: is the optional signed immediate byte offset, a multiple of 8 in the range -512 to
504, defaulting to 0 and encoded in the "imm7" field as <imm>/8.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if opc<0> == '1' then UNDEFINED;
integer scale = 2 + UInt(opc<1>);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = n != 31;
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data1 = X[t];
data2 = X[t2];
Mem[address, dbytes, AccType_STREAM] = data1;
Mem[address+dbytes, dbytes, AccType_STREAM] = data2;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Pair of Registers calculates an address from a base register value and an immediate offset, and stores two 32-bit
words or two 64-bit doublewords to the calculated address, from two registers. For information about memory
accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 0 1 0 imm7 Rt2 Rn Rt
opc L
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 1 1 0 imm7 Rt2 Rn Rt
opc L
Signed offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 1 0 0 imm7 Rt2 Rn Rt
opc L
Assembler Symbols
<Wt1> Is the 32-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Wt2> Is the 32-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the 32-bit post-index and 32-bit pre-index variant: is the signed immediate byte offset, a multiple of
4 in the range -256 to 252, encoded in the "imm7" field as <imm>/4.
For the 32-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 4 in the
range -256 to 252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit post-index and 64-bit pre-index variant: is the signed immediate byte offset, a multiple of
8 in the range -512 to 504, encoded in the "imm7" field as <imm>/8.
For the 64-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 8 in the
range -512 to 504, defaulting to 0 and encoded in the "imm7" field as <imm>/8.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if L:opc<0> == '01' || opc == '11' then UNDEFINED;
integer scale = 2 + UInt(opc<1>);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = wback || n != 31;
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Register (immediate) stores a word or a doubleword from a register to memory. The address that is used for the
store is calculated from a base register and an immediate offset. For information about memory accesses, see Load/
Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 0 imm9 0 1 Rn Rt
size opc
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 0 imm9 1 1 Rn Rt
size opc
Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 1 0 0 imm12 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> For the 32-bit variant: is the optional positive immediate byte offset, a multiple of 4 in the range 0 to
16380, defaulting to 0 and encoded in the "imm12" field as <pimm>/4.
For the 64-bit variant: is the optional positive immediate byte offset, a multiple of 8 in the range 0 to
32760, defaulting to 0 and encoded in the "imm12" field as <pimm>/8.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
bits(64) address;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
if rt_unknown then
data = bits(datasize) UNKNOWN;
else
data = X[t];
Mem[address, datasize DIV 8, AccType_NORMAL] = data;
if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Register (register) calculates an address from a base register value and an offset register value, and stores a
32-bit word or a 64-bit doubleword to the calculated address, from a register. For information about memory accesses,
see Load/Store addressing modes.
The instruction uses an offset addressing mode, that calculates the address used for the memory access from a base
register value and an offset register value. The offset can be optionally shifted and extended.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 1 Rm option S 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.
<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
<amount> For the 32-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #2
For the 64-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #3
integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
Operation
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, datasize DIV 8, AccType_NORMAL] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Register Byte (immediate) stores the least significant byte of a 32-bit register to memory. The address that is
used for the store is calculated from a base register and an immediate offset. For information about memory accesses,
see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 0 imm9 0 1 Rn Rt
size opc
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 0 imm9 1 1 Rn Rt
size opc
Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 1 0 0 imm12 Rn Rt
size opc
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STRB (immediate).
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to 0 and encoded in the
"imm12" field.
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
if rt_unknown then
data = bits(8) UNKNOWN;
else
data = X[t];
Mem[address, 1, AccType_NORMAL] = data;
if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Register Byte (register) calculates an address from a base register value and an offset register value, and stores
a byte from a 32-bit register to the calculated address. For information about memory accesses, see Load/Store
addressing modes.
The instruction uses an offset addressing mode, that calculates the address used for the memory access from a base
register value and an offset register value. The offset can be optionally shifted and extended.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 1 Rm option S 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.
option <extend>
010 UXTW
110 SXTW
111 SXTX
<amount> Is the index shift amount, it must be #0, encoded in "S" as 0 if omitted, or as 1 if present.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 1, AccType_NORMAL] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Register Halfword (immediate) stores the least significant halfword of a 32-bit register to memory. The address
that is used for the store is calculated from a base register and an immediate offset. For information about memory
accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 0 imm9 0 1 Rn Rt
size opc
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 0 imm9 1 1 Rn Rt
size opc
Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 1 0 0 imm12 Rn Rt
size opc
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STRH (immediate).
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, a multiple of 2 in the range 0 to 8190, defaulting to 0 and
encoded in the "imm12" field as <pimm>/2.
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
if rt_unknown then
data = bits(16) UNKNOWN;
else
data = X[t];
Mem[address, 2, AccType_NORMAL] = data;
if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Register Halfword (register) calculates an address from a base register value and an offset register value, and
stores a halfword from a 32-bit register to the calculated address. For information about memory accesses, see Load/
Store addressing modes.
The instruction uses an offset addressing mode, that calculates the address used for the memory access from a base
register value and an offset register value. The offset can be optionally shifted and extended.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 1 Rm option S 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.
<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
<amount> Is the index shift amount, optional only when <extend> is not LSL. Where it is permitted to be optional,
it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #1
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 2, AccType_NORMAL] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic bit set on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword
from memory, performs a bitwise OR with the value held in a register on it, and stores the result back to memory.
• STSET does not have release semantics.
• STSETL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDSET, LDSETA, LDSETAL, LDSETL.
• The description of LDSET, LDSETA, LDSETAL, LDSETL gives the operational pseudocode for this instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 0 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt
is equivalent to
is equivalent to
is equivalent to
is equivalent to
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDSET, LDSETA, LDSETAL, LDSETL gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic bit set on byte in memory, without return, atomically loads an 8-bit byte from memory, performs a bitwise OR
with the value held in a register on it, and stores the result back to memory.
• STSETB does not have release semantics.
• STSETLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDSETB, LDSETAB, LDSETALB,
LDSETLB.
• The description of LDSETB, LDSETAB, LDSETALB, LDSETLB gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 0 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDSETB, LDSETAB, LDSETALB, LDSETLB gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic bit set on halfword in memory, without return, atomically loads a 16-bit halfword from memory, performs a
bitwise OR with the value held in a register on it, and stores the result back to memory.
• STSETH does not have release semantics.
• STSETLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDSETH, LDSETAH, LDSETALH,
LDSETLH.
• The description of LDSETH, LDSETAH, LDSETALH, LDSETLH gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 0 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDSETH, LDSETAH, LDSETALH, LDSETLH gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic signed maximum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, compares it against the value held in a register, and stores the larger value back to memory,
treating the values as signed numbers.
• STSMAX does not have release semantics.
• STSMAXL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDSMAX, LDSMAXA, LDSMAXAL,
LDSMAXL.
• The description of LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 1 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt
is equivalent to
is equivalent to
is equivalent to
is equivalent to
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic signed maximum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares it
against the value held in a register, and stores the larger value back to memory, treating the values as signed
numbers.
• STSMAXB does not have release semantics.
• STSMAXLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDSMAXB, LDSMAXAB, LDSMAXALB,
LDSMAXLB.
• The description of LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB gives the operational pseudocode for
this instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 1 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB gives the operational pseudocode for this
instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic signed maximum on halfword in memory, without return, atomically loads a 16-bit halfword from memory,
compares it against the value held in a register, and stores the larger value back to memory, treating the values as
signed numbers.
• STSMAXH does not have release semantics.
• STSMAXLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDSMAXH, LDSMAXAH,
LDSMAXALH, LDSMAXLH.
• The description of LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH gives the operational pseudocode for
this instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 1 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH gives the operational pseudocode for this
instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic signed minimum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, compares it against the value held in a register, and stores the smaller value back to
memory, treating the values as signed numbers.
• STSMIN does not have release semantics.
• STSMINL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDSMIN, LDSMINA, LDSMINAL,
LDSMINL.
• The description of LDSMIN, LDSMINA, LDSMINAL, LDSMINL gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 1 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt
is equivalent to
is equivalent to
is equivalent to
is equivalent to
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDSMIN, LDSMINA, LDSMINAL, LDSMINL gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic signed minimum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares it
against the value held in a register, and stores the smaller value back to memory, treating the values as signed
numbers.
• STSMINB does not have release semantics.
• STSMINLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDSMINB, LDSMINAB, LDSMINALB,
LDSMINLB.
• The description of LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB gives the operational pseudocode for
this instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 1 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB gives the operational pseudocode for this
instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic signed minimum on halfword in memory, without return, atomically loads a 16-bit halfword from memory,
compares it against the value held in a register, and stores the smaller value back to memory, treating the values as
signed numbers.
• STSMINH does not have release semantics.
• STSMINLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDSMINH, LDSMINAH, LDSMINALH,
LDSMINLH.
• The description of LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH gives the operational pseudocode for
this instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 1 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH gives the operational pseudocode for this
instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Register (unprivileged) stores a word or doubleword from a register to memory. The address that is used for the
store is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 0 imm9 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
AccType acctype;
unpriv_at_el1 = PSTATE.EL == EL1 && !(EL2Enabled() && HaveNVExt() && HCR_EL2.<NV,NV1> == '11');
unpriv_at_el2 = PSTATE.EL == EL2 && HaveVirtHostExt() && HCR_EL2.<E2H,TGE> == '11';
bits(64) address;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, datasize DIV 8, acctype] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Register Byte (unprivileged) stores a byte from a 32-bit register to memory. The address that is used for the
store is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 0 imm9 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
AccType acctype;
unpriv_at_el1 = PSTATE.EL == EL1 && !(EL2Enabled() && HaveNVExt() && HCR_EL2.<NV,NV1> == '11');
unpriv_at_el2 = PSTATE.EL == EL2 && HaveVirtHostExt() && HCR_EL2.<E2H,TGE> == '11';
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 1, acctype] = data;
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Register Halfword (unprivileged) stores a halfword from a 32-bit register to memory. The address that is used
for the store is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 0 imm9 1 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
AccType acctype;
unpriv_at_el1 = PSTATE.EL == EL1 && !(EL2Enabled() && HaveNVExt() && HCR_EL2.<NV,NV1> == '11');
unpriv_at_el2 = PSTATE.EL == EL2 && HaveVirtHostExt() && HCR_EL2.<E2H,TGE> == '11';
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 2, acctype] = data;
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic unsigned maximum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, compares it against the value held in a register, and stores the larger value back to memory,
treating the values as unsigned numbers.
• STUMAX does not have release semantics.
• STUMAXL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDUMAX, LDUMAXA, LDUMAXAL,
LDUMAXL.
• The description of LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 1 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt
is equivalent to
is equivalent to
is equivalent to
is equivalent to
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL gives the operational pseudocode for this
instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic unsigned maximum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares
it against the value held in a register, and stores the larger value back to memory, treating the values as unsigned
numbers.
• STUMAXB does not have release semantics.
• STUMAXLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDUMAXB, LDUMAXAB,
LDUMAXALB, LDUMAXLB.
• The description of LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB gives the operational pseudocode for
this instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 1 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB gives the operational pseudocode for this
instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic unsigned maximum on halfword in memory, without return, atomically loads a 16-bit halfword from memory,
compares it against the value held in a register, and stores the larger value back to memory, treating the values as
unsigned numbers.
• STUMAXH does not have release semantics.
• STUMAXLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDUMAXH, LDUMAXAH,
LDUMAXALH, LDUMAXLH.
• The description of LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH gives the operational pseudocode for
this instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 1 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH gives the operational pseudocode for this
instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic unsigned minimum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, compares it against the value held in a register, and stores the smaller value back to
memory, treating the values as unsigned numbers.
• STUMIN does not have release semantics.
• STUMINL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDUMIN, LDUMINA, LDUMINAL,
LDUMINL.
• The description of LDUMIN, LDUMINA, LDUMINAL, LDUMINL gives the operational pseudocode for this
instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 1 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt
is equivalent to
is equivalent to
is equivalent to
is equivalent to
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDUMIN, LDUMINA, LDUMINAL, LDUMINL gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic unsigned minimum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares it
against the value held in a register, and stores the smaller value back to memory, treating the values as unsigned
numbers.
• STUMINB does not have release semantics.
• STUMINLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDUMINB, LDUMINAB, LDUMINALB,
LDUMINLB.
• The description of LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB gives the operational pseudocode for
this instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 1 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB gives the operational pseudocode for this
instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Atomic unsigned minimum on halfword in memory, without return, atomically loads a 16-bit halfword from memory,
compares it against the value held in a register, and stores the smaller value back to memory, treating the values as
unsigned numbers.
• STUMINH does not have release semantics.
• STUMINLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
• The encodings in this description are named to match the encodings of LDUMINH, LDUMINAH,
LDUMINALH, LDUMINLH.
• The description of LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH gives the operational pseudocode for
this instruction.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 1 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt
No memory ordering (R == 0)
is equivalent to
Release (R == 1)
is equivalent to
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Operation
The description of LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH gives the operational pseudocode for this
instruction.
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Register (unscaled) calculates an address from a base register value and an immediate offset, and stores a 32-bit
word or a 64-bit doubleword to the calculated address, from a register. For information about memory accesses, see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, datasize DIV 8, AccType_NORMAL] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Store Register Byte (unscaled) calculates an address from a base register value and an immediate offset, and stores a
byte to the calculated address, from a 32-bit register. For information about memory accesses, see Load/Store
addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 1, AccType_NORMAL] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Register Halfword (unscaled) calculates an address from a base register value and an immediate offset, and
stores a halfword to the calculated address, from a 32-bit register. For information about memory accesses, see Load/
Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 0 imm9 0 0 Rn Rt
size opc
Assembler Symbols
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
Operation
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data = X[t];
Mem[address, 2, AccType_NORMAL] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Exclusive Pair of registers stores two 32-bit words or two 64-bit doublewords from two registers to a memory
location if the PE has exclusive access to the memory address, and returns a status value of 0 if the store was
successful, or of 1 if no store was performed. See Synchronization and semaphores. For information on single-copy
atomicity and alignment requirements, see Requirements for single-copy atomicity and Alignment of data accesses. If
a 64-bit pair Store-Exclusive succeeds, it causes a single-copy atomic update of the 128-bit memory location being
updated. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 sz 0 0 1 0 0 0 0 0 1 Rs 0 Rt2 Rn Rt
L o0
32-bit (sz == 0)
64-bit (sz == 1)
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2); // ignored by load/store single register
integer s = UInt(Rs); // ignored by all loads and store-release
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STXP.
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is
written, encoded in the "Rs" field. The value returned is:
0
If the operation updates memory.
1
If the operation fails to update memory.
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Wt1> Is the 32-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
Operation
bits(64) address;
bits(datasize) data;
constant integer dbytes = datasize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];
if rt_unknown then
data = bits(datasize) UNKNOWN;
else
bits(datasize DIV 2) el1 = X[t];
bits(datasize DIV 2) el2 = X[t2];
data = if BigEndian(AccType_ATOMIC) then el1:el2 else el2:el1;
bit status = '1';
// Check whether the Exclusives monitors are set to include the
// physical memory locations corresponding to virtual address
// range [address, address+dbytes-1].
if AArch64.ExclusiveMonitorsPass(address, dbytes) then
// This atomic write will be rejected if it does not refer
// to the same physical locations after address translation.
Mem[address, dbytes, AccType_ATOMIC] = data;
status = ExclusiveMonitorsStatus();
X[s] = ZeroExtend(status, 32);
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Exclusive Register stores a 32-bit word or a 64-bit doubleword from a register to memory if the PE has exclusive
access to the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was
performed. See Synchronization and semaphores. For information about memory accesses see Load/Store addressing
modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 0 0 0 Rs 0 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs); // ignored by all loads and store-release
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STXR.
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is
written, encoded in the "Rs" field. The value returned is:
0
If the operation updates memory.
1
If the operation fails to update memory.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Aborts and alignment
If a synchronous Data Abort exception is generated by the execution of this instruction:
• Memory is not updated.
• <Ws> is not updated.
Operation
bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];
if rt_unknown then
data = bits(elsize) UNKNOWN;
else
data = X[t];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Exclusive Register Byte stores a byte from a register to memory if the PE has exclusive access to the memory
address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See
Synchronization and semaphores. The memory access is atomic.
For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 0 0 0 Rs 0 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs); // ignored by all loads and store-release
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STXRB.
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is
written, encoded in the "Rs" field. The value returned is:
0
If the operation updates memory.
1
If the operation fails to update memory.
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Aborts
If a synchronous Data Abort exception is generated by the execution of this instruction:
• Memory is not updated.
• <Ws> is not updated.
If AArch64.ExclusiveMonitorsPass() returns FALSE and the memory address, if accessed, would generate a
synchronous Data Abort exception, it is IMPLEMENTATION DEFINED whether the exception is generated.
bits(64) address;
bits(8) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];
if rt_unknown then
data = bits(8) UNKNOWN;
else
data = X[t];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Exclusive Register Halfword stores a halfword from a register to memory if the PE has exclusive access to the
memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See
Synchronization and semaphores. The memory access is atomic.
For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 0 0 0 Rs 0 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2
integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs); // ignored by all loads and store-release
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is
written, encoded in the "Rs" field. The value returned is:
0
If the operation updates memory.
1
If the operation fails to update memory.
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Aborts and alignment
If a synchronous Data Abort exception is generated by the execution of this instruction:
• Memory is not updated.
• <Ws> is not updated.
A non halfword-aligned memory address causes an Alignment fault Data Abort exception to be generated, subject to
the following rules:
• If AArch64.ExclusiveMonitorsPass() returns TRUE, the exception is generated.
• Otherwise, it is IMPLEMENTATION DEFINED whether the exception is generated.
If AArch64.ExclusiveMonitorsPass() returns FALSE and the memory address, if accessed, would generate a
synchronous Data Abort exception, it is IMPLEMENTATION DEFINED whether the exception is generated.
bits(64) address;
bits(16) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];
if rt_unknown then
data = bits(16) UNKNOWN;
else
data = X[t];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Allocation Tags, Zeroing stores an Allocation Tag to two Tag granules of memory, zeroing the associated data
locations. The address used for the store is calculated from the base register and an immediate signed offset scaled by
the Tag granule. The Allocation Tag is calculated from the Logical Address Tag in the source register.
This instruction generates an Unchecked access.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset
Post-index
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 1 1 imm9 0 1 Xn Xt
Pre-index
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 1 1 imm9 1 1 Xn Xt
Signed offset
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 1 1 imm9 1 0 Xn Xt
Assembler Symbols
<Xt|SP> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Xt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Xn" field.
<simm> Is the optional signed immediate offset, a multiple of 16 in the range -4096 to 4080, defaulting to 0 and
encoded in the "imm9" field.
bits(64) address;
bits(64) data = if t == 31 then SP[] else X[t];
bits(4) tag = AArch64.AllocationTagFromAddress(data);
SetTagCheckedInstruction(FALSE);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
if writeback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Allocation Tag, Zeroing stores an Allocation Tag to memory, zeroing the associated data location. The address
used for the store is calculated from the base register and an immediate signed offset scaled by the Tag granule. The
Allocation Tag is calculated from the Logical Address Tag in the source register.
This instruction generates an Unchecked access.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset
Post-index
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 1 1 imm9 0 1 Xn Xt
Pre-index
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 1 1 imm9 1 1 Xn Xt
Signed offset
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 1 1 imm9 1 0 Xn Xt
Assembler Symbols
<Xt|SP> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Xt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Xn" field.
<simm> Is the optional signed immediate offset, a multiple of 16 in the range -4096 to 4080, defaulting to 0 and
encoded in the "imm9" field.
bits(64) address;
SetTagCheckedInstruction(FALSE);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
if writeback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Tag and Zero Multiple writes a naturally aligned block of N Allocation Tags and stores zero to the associated
data locations, where the size of N is identified in DCZID_EL0.BS, and the Allocation Tag written to address A is taken
from the source register bits<3:0>.
This instruction is UNDEFINED at EL0.
This instruction generates an Unchecked access.
Integer
(FEAT_MTE2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Xn Xt
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Xt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Xn" field.
Operation
for i = 0 to count-1
AArch64.MemTag[address, AccType_NORMAL] = tag;
Mem[address, TAG_GRANULE, AccType_NORMAL] = Zeros(8 * TAG_GRANULE);
address = address + TAG_GRANULE;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract (extended register) subtracts a sign or zero-extended register value, followed by an optional left shift
amount, from a register value, and writes the result to the destination register. The argument that is extended from
the <Rm> register can be a byte, halfword, word, or doubleword.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 0 1 0 1 1 0 0 1 Rm option imm3 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
ExtendType extend_type = DecodeRegExtend(option);
integer shift = UInt(imm3);
if shift > 4 then UNDEFINED;
Assembler Symbols
<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Wn|WSP> Is the 32-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
option <R>
00x W
010 W
x11 X
10x W
110 W
<m> Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in the
"Rm" field.
<extend> For the 32-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
For the 64-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rd" or "Rn" is '11111' (SP) and "option" is '011' then LSL is preferred, but may be omitted when
"imm3" is '000'. In all other cases <extend> is required and must be UXTX when "option" is '011'.
<amount> Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0, encoded in the
"imm3" field. It must be absent when <extend> is absent, is required when <extend> is LSL, and is
optional when <extend> is present but not LSL.
Operation
bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2 = ExtendReg(m, extend_type, shift);
operand2 = NOT(operand2);
(result, -) = AddWithCarry(operand1, operand2, '1');
if d == 31 then
SP[] = result;
else
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract (immediate) subtracts an optionally-shifted immediate value from a register value, and writes the result to
the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 0 1 0 sh imm12 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(datasize) imm;
case sh of
when '0' imm = ZeroExtend(imm12, datasize);
when '1' imm = ZeroExtend(imm12:Zeros(12), datasize);
Assembler Symbols
<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Wn|WSP> Is the 32-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<imm> Is an unsigned immediate, in the range 0 to 4095, encoded in the "imm12" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #12
Operation
bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2;
operand2 = NOT(imm);
(result, -) = AddWithCarry(operand1, operand2, '1');
if d == 31 then
SP[] = result;
else
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract (shifted register) subtracts an optionally-shifted register value from a register value, and writes the result to
the destination register.
This instruction is used by the alias NEG (shifted register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 0 1 0 1 1 shift 0 Rm imm6 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field.
Alias Conditions
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
operand2 = NOT(operand2);
(result, -) = AddWithCarry(operand1, operand2, '1');
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract with Tag subtracts an immediate value scaled by the Tag granule from the address in the source register,
modifies the Logical Address Tag of the address using an immediate value, and writes the result to the destination
register. Tags specified in GCR_EL1.Exclude are excluded from the possible outputs when modifying the Logical
Address Tag.
Integer
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 0 0 1 1 0 uimm6 (0) (0) uimm4 Xn Xd
op3
Assembler Symbols
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Xd"
field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Xn" field.
<uimm6> Is an unsigned immediate, a multiple of 16 in the range 0 to 1008, encoded in the "uimm6" field.
<uimm4> Is an unsigned immediate, in the range 0 to 15, encoded in the "uimm4" field.
Operation
if AArch64.AllocationTagAccessIsEnabled(AccType_NORMAL) then
rtag = AArch64.ChooseNonExcludedTag(start_tag, uimm4, exclude);
else
rtag = '0000';
if d == 31 then
SP[] = result;
else
X[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract Pointer subtracts the 56-bit address held in the second source register from the 56-bit address held in the
first source register, sign-extends the result to 64-bits, and writes the result to the destination register.
Integer
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 0 1 1 0 Xm 0 0 0 0 0 0 Xn Xd
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Xd" field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Xn"
field.
<Xm|SP> Is the 64-bit name of the second general-purpose source register or stack pointer, encoded in the "Xm"
field.
Operation
bits(64) result;
operand2 = NOT(operand2);
(result, -) = AddWithCarry(operand1, operand2, '1');
X[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract Pointer, setting Flags subtracts the 56-bit address held in the second source register from the 56-bit address
held in the first source register, sign-extends the result to 64-bits, and writes the result to the destination register. It
updates the condition flags based on the result of the subtraction.
This instruction is used by the alias CMPP.
Integer
(FEAT_MTE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 1 0 1 1 0 Xm 0 0 0 0 0 0 Xn Xd
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Xd" field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Xn"
field.
<Xm|SP> Is the 64-bit name of the second general-purpose source register or stack pointer, encoded in the "Xm"
field.
Alias Conditions
Operation
bits(64) result;
bits(4) nzcv;
operand2 = NOT(operand2);
(result, nzcv) = AddWithCarry(operand1, operand2, '1');
PSTATE.<N,Z,C,V> = nzcv;
X[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract (extended register), setting flags, subtracts a sign or zero-extended register value, followed by an optional
left shift amount, from a register value, and writes the result to the destination register. The argument that is
extended from the <Rm> register can be a byte, halfword, word, or doubleword. It updates the condition flags based
on the result.
This instruction is used by the alias CMP (extended register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 1 0 0 1 Rm option imm3 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
ExtendType extend_type = DecodeRegExtend(option);
integer shift = UInt(imm3);
if shift > 4 then UNDEFINED;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn|WSP> Is the 32-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
option <R>
00x W
010 W
x11 X
10x W
110 W
<m> Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in the
"Rm" field.
<extend> For the 32-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
For the 64-bit variant: is the extension to be applied to the second source operand, encoded in “option”:
option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rn" is '11111' (SP) and "option" is '011' then LSL is preferred, but may be omitted when "imm3" is
'000'. In all other cases <extend> is required and must be UXTX when "option" is '011'.
<amount> Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0, encoded in the
"imm3" field. It must be absent when <extend> is absent, is required when <extend> is LSL, and is
optional when <extend> is present but not LSL.
Alias Conditions
Operation
bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2 = ExtendReg(m, extend_type, shift);
bits(4) nzcv;
operand2 = NOT(operand2);
(result, nzcv) = AddWithCarry(operand1, operand2, '1');
PSTATE.<N,Z,C,V> = nzcv;
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract (immediate), setting flags, subtracts an optionally-shifted immediate value from a register value, and writes
the result to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias CMP (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 0 1 0 sh imm12 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(datasize) imm;
case sh of
when '0' imm = ZeroExtend(imm12, datasize);
when '1' imm = ZeroExtend(imm12:Zeros(12), datasize);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn|WSP> Is the 32-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<imm> Is an unsigned immediate, in the range 0 to 4095, encoded in the "imm12" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #12
Alias Conditions
Operation
bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2;
bits(4) nzcv;
operand2 = NOT(imm);
(result, nzcv) = AddWithCarry(operand1, operand2, '1');
PSTATE.<N,Z,C,V> = nzcv;
X[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract (shifted register), setting flags, subtracts an optionally-shifted register value from a register value, and writes
the result to the destination register. It updates the condition flags based on the result.
This instruction is used by the aliases CMP (shifted register), and NEGS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 1 shift 0 Rm imm6 Rn Rd
op S
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field.
Alias Conditions
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(4) nzcv;
operand2 = NOT(operand2);
(result, nzcv) = AddWithCarry(operand1, operand2, '1');
PSTATE.<N,Z,C,V> = nzcv;
X[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SVC #<imm>
// Empty.
Assembler Symbols
<imm> Is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.
Operation
AArch64.CheckForSVCTrap(imm16);
AArch64.CallSupervisor(imm16);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Swap word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from a memory location,
and stores the value held in a register back to the same memory location. The value initially loaded from memory is
returned in the destination register.
• If the destination register is not one of WZR or XZR, SWPA and SWPAL load from memory with acquire
semantics.
• SWPL and SWPAL store to memory with release semantics.
• SWP has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 1 0 0 0 0 0 Rn Rt
size
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register to be stored, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register to be stored, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
bits(64) address;
bits(datasize) data;
bits(datasize) store_value;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
store_value = X[s];
data = MemAtomic(address, MemAtomicOp_SWP, store_value, ldacctype, stacctype);
X[t] = ZeroExtend(data, regsize);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Swap byte in memory atomically loads an 8-bit byte from a memory location, and stores the value held in a register
back to the same memory location. The value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, SWPAB and SWPALB load from memory with acquire semantics.
• SWPLB and SWPALB store to memory with release semantics.
• SWPB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 1 0 0 0 0 0 Rn Rt
size
SWPAB (A == 1 && R == 0)
SWPALB (A == 1 && R == 1)
SWPB (A == 0 && R == 0)
SWPLB (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register to be stored, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
bits(64) address;
bits(8) data;
bits(8) store_value;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
store_value = X[s];
data = MemAtomic(address, MemAtomicOp_SWP, store_value, ldacctype, stacctype);
X[t] = ZeroExtend(data, 32);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Swap halfword in memory atomically loads a 16-bit halfword from a memory location, and stores the value held in a
register back to the same memory location. The value initially loaded from memory is returned in the destination
register.
• If the destination register is not WZR, SWPAH and SWPALH load from memory with acquire semantics.
• SWPLH and SWPALH store to memory with release semantics.
• SWPH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
Integer
(FEAT_LSE)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 1 0 0 0 0 0 Rn Rt
size
SWPAH (A == 1 && R == 0)
SWPALH (A == 1 && R == 1)
SWPH (A == 0 && R == 0)
SWPLH (A == 0 && R == 1)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);
Assembler Symbols
<Ws> Is the 32-bit name of the general-purpose register to be stored, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
bits(64) address;
bits(16) data;
bits(16) store_value;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
store_value = X[s];
data = MemAtomic(address, MemAtomicOp_SWP, store_value, ldacctype, stacctype);
X[t] = ZeroExtend(data, 32);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Extend Byte extracts an 8-bit value from a register, sign-extends it to the size of the register, and writes the
result to the destination register.
• The encodings in this description are named to match the encodings of SBFM.
• The description of SBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 0 N 0 0 0 0 0 0 0 0 0 1 1 1 Rn Rd
opc immr imms
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
The description of SBFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Sign Extend Halfword extracts a 16-bit value, sign-extends it to the size of the register, and writes the result to the
destination register.
• The encodings in this description are named to match the encodings of SBFM.
• The description of SBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 0 N 0 0 0 0 0 0 0 0 1 1 1 1 Rn Rd
opc immr imms
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
The description of SBFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Sign Extend Word sign-extends a word to the size of the register, and writes the result to the destination register.
• The encodings in this description are named to match the encodings of SBFM.
• The description of SBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 1 1 1 1 Rn Rd
sf opc N immr imms
64-bit
is equivalent to
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
The description of SBFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
System instruction. For more information, see Op0 equals 0b01, cache maintenance, TLB maintenance, and address
translation instructions for the encodings of System instructions.
This instruction is used by the aliases AT, CFP, CPP, DC, DVP, IC, and TLBI.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 op1 CRn CRm op2 Rt
L
integer t = UInt(Rt);
Assembler Symbols
<op1> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op1" field.
<Cn> Is a name 'Cn', with 'n' in the range 0 to 15, encoded in the "CRn" field.
<Cm> Is a name 'Cm', with 'm' in the range 0 to 15, encoded in the "CRm" field.
<op2> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op2" field.
<Xt> Is the 64-bit name of the optional general-purpose source register, defaulting to '11111', encoded in the
"Rt" field.
Alias Conditions
Operation
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
System instruction with result. For more information, see Op0 equals 0b01, cache maintenance, TLB maintenance, and
address translation instructions for the encodings of System instructions.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 1 0 1 op1 CRn CRm op2 Rt
L
integer t = UInt(Rt);
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose destination register, encoded in the "Rt" field.
<op1> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op1" field.
<Cn> Is a name 'Cn', with 'n' in the range 0 to 15, encoded in the "CRn" field.
<Cm> Is a name 'Cm', with 'm' in the range 0 to 15, encoded in the "CRm" field.
<op2> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op2" field.
Operation
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Test bit and Branch if Nonzero compares the value of a bit in a general-purpose register with zero, and conditionally
branches to a label at a PC-relative offset if the comparison is not equal. It provides a hint that this is not a subroutine
call or return. This instruction does not affect condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
b5 0 1 1 0 1 1 1 b40 imm14 Rt
op
integer t = UInt(Rt);
Assembler Symbols
b5 <R>
0 W
1 X
In assembler source code an 'X' specifier is always permitted, but a 'W' specifier is only permitted when
the bit number is less than 32.
<t> Is the number [0-30] of the general-purpose register to be tested or the name ZR (31), encoded in the
"Rt" field.
<imm> Is the bit number to be tested, in the range 0 to 63, encoded in "b5:b40".
<label> Is the program label to be conditionally branched to. Its offset from the address of this instruction, in
the range +/-32KB, is encoded as "imm14" times 4.
Operation
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Test bit and Branch if Zero compares the value of a test bit with zero, and conditionally branches to a label at a PC-
relative offset if the comparison is equal. It provides a hint that this is not a subroutine call or return. This instruction
does not affect condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
b5 0 1 1 0 1 1 0 b40 imm14 Rt
op
integer t = UInt(Rt);
Assembler Symbols
b5 <R>
0 W
1 X
In assembler source code an 'X' specifier is always permitted, but a 'W' specifier is only permitted when
the bit number is less than 32.
<t> Is the number [0-30] of the general-purpose register to be tested or the name ZR (31), encoded in the
"Rt" field.
<imm> Is the bit number to be tested, in the range 0 to 63, encoded in "b5:b40".
<label> Is the program label to be conditionally branched to. Its offset from the address of this instruction, in
the range +/-32KB, is encoded as "imm14" times 4.
Operation
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
TLB Invalidate operation. For more information, see op0==0b01, cache maintenance, TLB maintenance, and address
translation instructions.
• The encodings in this description are named to match the encodings of SYS.
• The description of SYS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 op1 1 0 0 0 CRm op2 Rt
L CRn
is equivalent to
Assembler Symbols
<op1> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op1" field.
<Cm> Is a name 'Cm', with 'm' in the range 0 to 15, encoded in the "CRm" field.
<op2> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op2" field.
<tlbi_op> Is a TLBI instruction name, as listed for the TLBI system instruction group, encoded in “op1:CRm:op2”:
<Xt> Is the 64-bit name of the optional general-purpose source register, defaulting to '11111', encoded in the
"Rt" field.
Operation
The description of SYS gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Trace Synchronization Barrier. This instruction is a barrier that synchronizes the trace operations of instructions.
If FEAT_TRF is not implemented, this instruction executes as a NOP.
System
(FEAT_TRF)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 1 1 1 1 1
CRm op2
TSB CSYNC
Operation
TraceSynchronizationBarrier();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Test bits (immediate), setting the condition flags and discarding the result
: Rn AND imm.
• The encodings in this description are named to match the encodings of ANDS (immediate).
• The description of ANDS (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 1 0 0 N immr imms Rn 1 1 1 1 1
opc Rd
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<imm> For the 32-bit variant: is the bitmask immediate, encoded in "imms:immr".
For the 64-bit variant: is the bitmask immediate, encoded in "N:imms:immr".
Operation
The description of ANDS (immediate) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Test (shifted register) performs a bitwise AND operation on a register value and an optionally-shifted register value. It
updates the condition flags based on the result, and discards the result.
• The encodings in this description are named to match the encodings of ANDS (shifted register).
• The description of ANDS (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 0 shift 0 Rm imm6 Rn 1 1 1 1 1
opc N Rd
32-bit (sf == 0)
is equivalent to
64-bit (sf == 1)
is equivalent to
Assembler Symbols
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:
shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR
<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field,
Operation
The description of ANDS (shifted register) gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Bitfield Insert in Zeros copies a bitfield of <width> bits from the least significant bits of the source register
to bit position <lsb> of the destination register, setting the destination bits above and below the bitfield to zero.
• The encodings in this description are named to match the encodings of UBFM.
• The description of UBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 1 0 N immr imms Rn Rd
opc
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<lsb> For the 32-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 63.
<width> For the 32-bit variant: is the width of the bitfield, in the range 1 to 32-<lsb>.
For the 64-bit variant: is the width of the bitfield, in the range 1 to 64-<lsb>.
Operation
The description of UBFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Unsigned Bitfield Move is usually accessed via one of its aliases, which are always preferred for disassembly.
If <imms> is greater than or equal to <immr>, this copies a bitfield of (<imms>-<immr>+1) bits starting from bit
position <immr> in the source register to the least significant bits of the destination register.
If <imms> is less than <immr>, this copies a bitfield of (<imms>+1) bits from the least significant bits of the source
register to bit position (regsize-<immr>) of the destination register, where regsize is the destination register size of 32
or 64 bits.
In both cases the destination bits below and above the bitfield are set to zero.
This instruction is used by the aliases LSL (immediate), LSR (immediate), UBFIZ, UBFX, UXTB, and UXTH.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 1 0 N immr imms Rn Rd
opc
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
integer R;
bits(datasize) wmask;
bits(datasize) tmask;
R = UInt(immr);
(wmask, tmask) = DecodeBitMasks(N, imms, immr, FALSE);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<immr> For the 32-bit variant: is the right rotate amount, in the range 0 to 31, encoded in the "immr" field.
For the 64-bit variant: is the right rotate amount, in the range 0 to 63, encoded in the "immr" field.
<imms> For the 32-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 31,
encoded in the "imms" field.
For the 64-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 63,
encoded in the "imms" field.
Alias Conditions
Operation
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Bitfield Extract copies a bitfield of <width> bits starting from bit position <lsb> in the source register to the
least significant bits of the destination register, and sets destination bits above the bitfield to zero.
• The encodings in this description are named to match the encodings of UBFM.
• The description of UBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 1 0 N immr imms Rn Rd
opc
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<lsb> For the 32-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 63.
<width> For the 32-bit variant: is the width of the bitfield, in the range 1 to 32-<lsb>.
For the 64-bit variant: is the width of the bitfield, in the range 1 to 64-<lsb>.
Operation
The description of UBFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Permanently Undefined generates an Undefined Instruction exception (ESR_ELx.EC = 0b000000). The encodings for
UDF used in this section are defined as permanently UNDEFINED.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 imm16
UDF #<imm>
Assembler Symbols
<imm> is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field. The PE ignores
the value of this constant.
Operation
// No operation.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Divide divides an unsigned integer register value by another unsigned integer register value, and writes the
result to the destination register. The condition flags are not affected.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 0 0 1 0 Rn Rd
o1
32-bit (sf == 0)
64-bit (sf == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
Operation
if IsZero(operand2) then
result = 0;
else
result = RoundTowardsZero(Real(Int(operand1, TRUE)) / Real(Int(operand2, TRUE)));
X[d] = result<datasize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Multiply-Add Long multiplies two 32-bit register values, adds a 64-bit register value, and writes the result to
the 64-bit destination register.
This instruction is used by the alias UMULL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 0 1 Rm 0 Ra Rn Rd
U o0
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xa> Is the 64-bit name of the third general-purpose source register holding the addend, encoded in the "Ra"
field.
Alias Conditions
Operation
integer result;
X[d] = result<63:0>;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Multiply-Negate Long multiplies two 32-bit register values, negates the product, and writes the result to the
64-bit destination register.
• The encodings in this description are named to match the encodings of UMSUBL.
• The description of UMSUBL gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 0 1 Rm 1 1 1 1 1 1 Rn Rd
U o0 Ra
is equivalent to
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
Operation
The description of UMSUBL gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Multiply-Subtract Long multiplies two 32-bit register values, subtracts the product from a 64-bit register
value, and writes the result to the 64-bit destination register.
This instruction is used by the alias UMNEGL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 0 1 Rm 1 Ra Rn Rd
U o0
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xa> Is the 64-bit name of the third general-purpose source register holding the minuend, encoded in the
"Ra" field.
Alias Conditions
Operation
integer result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Multiply High multiplies two 64-bit register values, and writes bits[127:64] of the 128-bit result to the 64-bit
destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 1 0 Rm 0 (1) (1) (1) (1) (1) Rn Rd
U Ra
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
Operation
integer result;
X[d] = result<127:64>;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Multiply Long multiplies two 32-bit register values, and writes the result to the 64-bit destination register.
• The encodings in this description are named to match the encodings of UMADDL.
• The description of UMADDL gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 0 1 Rm 0 1 1 1 1 1 Rn Rd
U o0 Ra
is equivalent to
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
Operation
The description of UMADDL gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Extend Byte extracts an 8-bit value from a register, zero-extends it to the size of the register, and writes the
result to the destination register.
• The encodings in this description are named to match the encodings of UBFM.
• The description of UBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 Rn Rd
sf opc N immr imms
32-bit
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
The description of UBFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Extend Halfword extracts a 16-bit value from a register, zero-extends it to the size of the register, and writes
the result to the destination register.
• The encodings in this description are named to match the encodings of UBFM.
• The description of UBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 Rn Rd
sf opc N immr imms
32-bit
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
The description of UBFM gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Wait For Event is a hint instruction that indicates that the PE can enter a low-power state and remain there until a
wakeup event occurs. Wakeup events include the event signaled as a result of executing the SEV instruction on any PE
in the multiprocessor system. For more information, see Wait For Event mechanism and Send event.
As described in Wait For Event mechanism and Send event, the execution of a WFE instruction that would otherwise
cause entry to a low-power state can be trapped to a higher Exception level.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 1 1 1 1
CRm op2
WFE
// Empty.
Operation
Hint_WFE(1, WFxType_WFE);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Wait For Event with Timeout is a hint instruction that indicates that the PE can enter a low-power state and remain
there until either a local timeout event or a wakeup event occurs. Wakeup events include the event signaled as a result
of executing the SEV instruction on any PE in the multiprocessor system. For more information, see Wait For Event
mechanism and Send event.
As described in Wait For Event mechanism and Send event, the execution of a WFET instruction that would otherwise
cause entry to a low-power state can be trapped to a higher Exception level.
System
(FEAT_WFxT)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 Rd
WFET <Xt>
integer d = UInt(Rd);
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rd" field.
Operation
Hint_WFE(localtimeout, WFxType_WFET);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Wait For Interrupt is a hint instruction that indicates that the PE can enter a low-power state and remain there until a
wakeup event occurs. For more information, see Wait For Interrupt.
As described in Wait For Interrupt, the execution of a WFI instruction that would otherwise cause entry to a low-power
state can be trapped to a higher Exception level.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 1 1 1 1 1
CRm op2
WFI
// Empty.
Operation
Hint_WFI(1, WFxType_WFI);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Wait For Interrupt with Timeout is a hint instruction that indicates that the PE can enter a low-power state and remain
there until either a local timeout event or a wakeup event occurs. For more information, see Wait For Interrupt.
As described in Wait For Interrupt, the execution of a WFIT instruction that would otherwise cause entry to a low-
power state can be trapped to a higher Exception level.
System
(FEAT_WFxT)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 Rd
WFIT <Xt>
integer d = UInt(Rd);
Assembler Symbols
<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rd" field.
Operation
Hint_WFI(localtimeout, WFxType_WFIT);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Convert floating-point condition flags from external format to Arm format. This instruction converts the state of the
PSTATE.{N,Z,C,V} flags from an alternative representation required by some software to a form representing the
result of an Arm floating-point scalar compare instruction.
System
(FEAT_FlagM2)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 (0) (0) (0) (0) 0 0 1 1 1 1 1 1
CRm
XAFLAG
Operation
PSTATE.N = N;
PSTATE.Z = Z;
PSTATE.C = C;
PSTATE.V = V;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Strip Pointer Authentication Code. This instruction removes the pointer authentication code from an address. The
address is in the specified general-purpose register for XPACI and XPACD, and is in LR for XPACLRI.
The XPACD instruction is used for data addresses, and XPACI and XPACLRI are used for instruction addresses.
It has encodings from 2 classes: Integer and System
Integer
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 1 0 0 0 D 1 1 1 1 1 Rd
Rn
XPACD (D == 1)
XPACD <Xd>
XPACI (D == 0)
XPACI <Xd>
if !HavePACExt() then
UNDEFINED;
System
(FEAT_PAuth)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1
XPACLRI
integer d = 30;
boolean data = FALSE;
Assembler Symbols
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
Operation
if HavePACExt() then
X[d] = Strip(X[d], data);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
YIELD is a hint instruction. Software with a multithreading capability can use a YIELD instruction to indicate to the PE
that it is performing a task, for example a spin-lock, that could be swapped out to improve overall system performance.
The PE can use this hint to suspend and resume multiple software threads if it supports the capability.
For more information about the recommended use of this instruction, see The YIELD instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1
CRm op2
YIELD
// Empty.
Operation
Hint_Yield();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BFMLALB, BFMLALT (by element): BFloat16 floating-point widening multiply-add long (by element).
Page 812
A64 -- SIMD and Floating-point Instructions (alphabetic order)
FCVTAS (scalar): Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar).
Page 813
A64 -- SIMD and Floating-point Instructions (alphabetic order)
FCVTAS (vector): Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector).
FCVTAU (scalar): Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar).
FCVTAU (vector): Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector).
FCVTMS (scalar): Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar).
FCVTMS (vector): Floating-point Convert to Signed integer, rounding toward Minus infinity (vector).
FCVTMU (scalar): Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar).
FCVTMU (vector): Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector).
FCVTNS (scalar): Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar).
FCVTNS (vector): Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector).
FCVTNU (scalar): Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar).
FCVTNU (vector): Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector).
FCVTPS (scalar): Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar).
FCVTPS (vector): Floating-point Convert to Signed integer, rounding toward Plus infinity (vector).
FCVTPU (scalar): Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar).
FCVTPU (vector): Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector).
FCVTXN, FCVTXN2: Floating-point Convert to lower precision Narrow, rounding to odd (vector).
FCVTZS (scalar, fixed-point): Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar).
FCVTZS (scalar, integer): Floating-point Convert to Signed integer, rounding toward Zero (scalar).
FCVTZS (vector, fixed-point): Floating-point Convert to Signed fixed-point, rounding toward Zero (vector).
FCVTZS (vector, integer): Floating-point Convert to Signed integer, rounding toward Zero (vector).
FCVTZU (scalar, fixed-point): Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar).
FCVTZU (scalar, integer): Floating-point Convert to Unsigned integer, rounding toward Zero (scalar).
FCVTZU (vector, fixed-point): Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector).
FCVTZU (vector, integer): Floating-point Convert to Unsigned integer, rounding toward Zero (vector).
Page 814
A64 -- SIMD and Floating-point Instructions (alphabetic order)
FMLAL, FMLAL2 (by element): Floating-point fused Multiply-Add Long to accumulator (by element).
FMLS (by element): Floating-point fused Multiply-Subtract from accumulator (by element).
FMLSL, FMLSL2 (by element): Floating-point fused Multiply-Subtract Long from accumulator (by element).
FMLSL, FMLSL2 (vector): Floating-point fused Multiply-Subtract Long from accumulator (vector).
Page 815
A64 -- SIMD and Floating-point Instructions (alphabetic order)
FRINT32X (scalar): Floating-point Round to 32-bit Integer, using current rounding mode (scalar).
FRINT32X (vector): Floating-point Round to 32-bit Integer, using current rounding mode (vector).
FRINT64X (scalar): Floating-point Round to 64-bit Integer, using current rounding mode (scalar).
FRINT64X (vector): Floating-point Round to 64-bit Integer, using current rounding mode (vector).
FRINTA (scalar): Floating-point Round to Integral, to nearest with ties to Away (scalar).
FRINTA (vector): Floating-point Round to Integral, to nearest with ties to Away (vector).
FRINTI (scalar): Floating-point Round to Integral, using current rounding mode (scalar).
FRINTI (vector): Floating-point Round to Integral, using current rounding mode (vector).
FRINTN (scalar): Floating-point Round to Integral, to nearest with ties to even (scalar).
FRINTN (vector): Floating-point Round to Integral, to nearest with ties to even (vector).
FRINTX (scalar): Floating-point Round to Integral exact, using current rounding mode (scalar).
FRINTX (vector): Floating-point Round to Integral exact, using current rounding mode (vector).
LD1 (multiple structures): Load multiple single-element structures to one, two, three, or four registers.
LD1 (single structure): Load one single-element structure to one lane of one register.
Page 816
A64 -- SIMD and Floating-point Instructions (alphabetic order)
LD1R: Load one single-element structure and Replicate to all lanes (of one register).
LD2 (single structure): Load single 2-element structure to one lane of two registers.
LD2R: Load single 2-element structure and Replicate to all lanes of two registers.
LD3 (single structure): Load single 3-element structure to one lane of three registers.
LD3R: Load single 3-element structure and Replicate to all lanes of three registers.
LD4 (single structure): Load single 4-element structure to one lane of four registers.
LD4R: Load single 4-element structure and Replicate to all lanes of four registers.
MOV (element): Move vector element to another vector element: an alias of INS (element).
MOV (from general): Move general-purpose register to a vector element: an alias of INS (general).
MOV (to general): Move vector element to general-purpose register: an alias of UMOV.
Page 817
A64 -- SIMD and Floating-point Instructions (alphabetic order)
Page 818
A64 -- SIMD and Floating-point Instructions (alphabetic order)
SM3PARTW1: SM3PARTW1.
SM3PARTW2: SM3PARTW2.
SM3SS1: SM3SS1.
SM3TT1A: SM3TT1A.
SM3TT1B: SM3TT1B.
SM3TT2A: SM3TT2A.
SM3TT2B: SM3TT2B.
SQDMLAL, SQDMLAL2 (by element): Signed saturating Doubling Multiply-Add Long (by element).
SQDMLSL, SQDMLSL2 (by element): Signed saturating Doubling Multiply-Subtract Long (by element).
Page 819
A64 -- SIMD and Floating-point Instructions (alphabetic order)
SQDMULH (by element): Signed saturating Doubling Multiply returning High half (by element).
SQDMULL, SQDMULL2 (by element): Signed saturating Doubling Multiply Long (by element).
SQRDMLAH (by element): Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by
element).
SQRDMLAH (vector): Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector).
SQRDMLSH (by element): Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element).
SQRDMLSH (vector): Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector).
SQRDMULH (by element): Signed saturating Rounding Doubling Multiply returning High half (by element).
SQRDMULH (vector): Signed saturating Rounding Doubling Multiply returning High half.
SQRSHRUN, SQRSHRUN2: Signed saturating Rounded Shift Right Unsigned Narrow (immediate).
ST1 (multiple structures): Store multiple single-element structures from one, two, three, or four registers.
ST1 (single structure): Store a single-element structure from one lane of one register.
Page 820
A64 -- SIMD and Floating-point Instructions (alphabetic order)
ST2 (multiple structures): Store multiple 2-element structures from two registers.
ST2 (single structure): Store single 2-element structure from one lane of two registers.
ST3 (multiple structures): Store multiple 3-element structures from three registers.
ST3 (single structure): Store single 3-element structure from one lane of three registers.
ST4 (multiple structures): Store multiple 4-element structures from four registers.
ST4 (single structure): Store single 4-element structure from one lane of four registers.
SUDOT (by element): Dot product with signed and unsigned integers (vector, by element).
Page 821
A64 -- SIMD and Floating-point Instructions (alphabetic order)
USDOT (by element): Dot Product with unsigned and signed integers (vector, by element).
USDOT (vector): Dot Product with unsigned and signed integers (vector).
USMMLA (vector): Unsigned and signed 8-bit integer matrix multiply-accumulate (vector).
Page 822
A64 -- SIMD and Floating-point Instructions (alphabetic order)
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Page 823
ABS
Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP
register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
if neg then
element = -element;
else
element = Abs(element);
Elem[result, e, esize] = element<esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results
into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 0 0 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size != '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean sub_op = (U == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 0 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean sub_op = (U == '1');
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if sub_op then
Elem[result, e, esize] = element1 - element2;
else
Elem[result, e, esize] = element1 + element2;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the
corresponding vector element in the second source SIMD&FP register, places the most significant half of the result
into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
The results are truncated. For rounded results, see RADDHN.
The ADDHN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
ADDHN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 0 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand1 = V[n];
bits(2*datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = if round then 1 << (esize - 1) else 0;
bits(2*esize) element1;
bits(2*esize) element2;
bits(2*esize) sum;
for e = 0 to elements-1
element1 = Elem[operand1, e, 2*esize];
element2 = Elem[operand2, e, 2*esize];
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
sum = sum + round_const;
Elem[result, e, esize] = sum<2*esize-1:esize>;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add Pair of elements (scalar). This instruction adds two vector elements in the source SIMD&FP register and writes
the scalar result into the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 1 0 0 0 1 1 0 1 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size <T>
0x RESERVED
10 RESERVED
11 2D
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_ADD, operand, esize);
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source
SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent
vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and
writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
Elem[result, e, esize] = element1 + element2;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the
scalar result to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 1 0 0 0 1 1 0 1 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
00 B
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_ADD, operand, esize);
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveAESExt() then UNDEFINED;
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveAESExt() then UNDEFINED;
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveAESExt() then UNDEFINED;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveAESExt() then UNDEFINED;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and
writes the result to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 1 Rm 0 0 0 1 1 1 Rn Rd
size
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 8B
1 16B
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the
complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting
vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA3 is implemented.
Advanced SIMD
(FEAT_SHA3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 0 1 Rm 0 Ra Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
<Va> Is the name of the third SIMD&FP source register, encoded in the "Ra" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Va = V[a];
V[d] = Vn EOR (Vm AND NOT(Va));
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point convert from single-precision to BFloat16 format (scalar) converts the single-precision floating-point
value in the 32-bit SIMD&FP source register to BFloat16 format and writes the result in the 16-bit SIMD&FP
destination register.
ID_AA64ISAR1_EL1.BF16 indicates whether this instruction is supported.
Single-precision to BFloat16
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 Rn Rd
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point convert from single-precision to BFloat16 format (vector) reads each single-precision element in the
SIMD&FP source vector, converts each value to BFloat16 format, and writes the results in the lower or upper half of
the SIMD&FP destination vector. The result elements are half the width of the source elements.
The BFCVTN instruction writes the half-width results to the lower half of the destination vector and clears the upper
half to zero, while the BFCVTN2 instruction writes the results to the upper half of the destination vector without
affecting the other bits in the register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 0 0 0 0 1 0 1 1 0 1 0 Rn Rd
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <Ta>
0 4H
1 8H
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(128) operand = V[n];
bits(64) result;
for e = 0 to elements-1
Elem[result, e, 16] = FPConvertBF(Elem[operand, e, 32], FPCR[]);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BFloat16 floating-point dot product (vector, by element). This instruction delimits the source vectors into pairs of
BFloat16 elements.
Irrespective of the control bits in the FPCR, this instruction:
• Performs an unfused sum-of-products of each pair of adjacent BFloat16 elements in the first source vector
with the specified pair of elements in the second source vector. The intermediate single-precision products
are rounded before they are summed, and the intermediate sum is rounded before accumulation into the
single-precision destination element that overlaps with the corresponding pair of BFloat16 elements in the
first source vector.
• Uses the non-IEEE 754 Round-to-Odd rounding mode, which forces bit 0 of an inexact result to 1, and rounds
an overflow to an appropriately signed Infinity.
• Does not modify the cumulative FPSR exception bits (IDC, IXC, UFC, OFC, DZC, and IOC).
• Disables trapped floating-point exceptions, as if the FPCR trap enable bits (IDE, IXE, UFE, OFE, DZE, and
IOE) are all zero.
• Flushes denormalized inputs and results to zero, as if FPCR.{FZ, FIZ} is {1, 1}.
• Generates only the default NaN, as if FPCR.DN is 1.
The BFloat16 pair within the second source vector is specified using an immediate index. The index range is from 0 to
3 inclusive. ID_AA64ISAR1_EL1.BF16 indicates whether this instruction is supported.
Vector
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 1 L M Rm 1 1 1 1 H 0 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <Ta>
0 2S
1 4S
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Q <Tb>
0 4H
1 8H
<Vm> Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.
<index> Is the immediate index of a pair of 16-bit elements in the range 0 to 3, encoded in the "H:L" fields.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(128) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
for e = 0 to elements-1
bits(16) elt1_a = Elem[operand1, 2*e+0, 16];
bits(16) elt1_b = Elem[operand1, 2*e+1, 16];
bits(16) elt2_a = Elem[operand2, 2*i+0, 16];
bits(16) elt2_b = Elem[operand2, 2*i+1, 16];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BFloat16 floating-point dot product (vector). This instruction delimits the source vectors into pairs of BFloat16
elements.
Irrespective of the control bits in the FPCR, this instruction:
• Performs an unfused sum-of-products of each pair of adjacent BFloat16 elements in the source vectors. The
intermediate single-precision products are rounded before they are summed, and the intermediate sum is
rounded before accumulation into the single-precision destination element that overlaps with the
corresponding pair of BFloat16 elements in the source vectors.
• Uses the non-IEEE 754 Round-to-Odd rounding mode, which forces bit 0 of an inexact result to 1, and rounds
an overflow to an appropriately signed Infinity.
• Does not modify the cumulative FPSR exception bits (IDC, IXC, UFC, OFC, DZC, and IOC).
• Disables trapped floating-point exceptions, as if the FPCR trap enable bits (IDE, IXE, UFE, OFE, DZE, and
IOE) are all zero.
• Flushes denormalized inputs and results to zero, as if FPCR.{FZ, FIZ} is {1, 1}.
• Generates only the default NaN, as if FPCR.DN is 1.
Vector
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 1 1 1 1 1 1 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <Ta>
0 2S
1 4S
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Q <Tb>
0 4H
1 8H
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
for e = 0 to elements-1
bits(16) elt1_a = Elem[operand1, 2*e+0, 16];
bits(16) elt1_b = Elem[operand1, 2*e+1, 16];
bits(16) elt2_a = Elem[operand2, 2*e+0, 16];
bits(16) elt2_b = Elem[operand2, 2*e+1, 16];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BFloat16 floating-point widening multiply-add long (by element) widens the even-numbered (bottom) or odd-numbered
(top) 16-bit elements in the first source vector, and the indexed element in the second source vector from Bfloat16 to
single-precision format. The instruction then multiplies and adds these values without intermediate rounding to single-
precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the first source
vector.
ID_AA64ISAR1_EL1.BF16 indicates whether this instruction is supported.
Vector
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 1 L M Rm 1 1 1 1 H 0 Rn Rd
Assembler Symbols
Q <bt>
0 B
1 T
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, in the range V0 to V15, encoded in the "Rm" field.
<index> Is the element index, in the range 0 to 7, encoded in the "H:L:M" fields.
Operation
CheckFPAdvSIMDEnabled64();
bits(128) result;
bits(128) operand1 = V[n];
bits(128) operand2 = V[m];
bits(128) operand3 = V[d];
bits(32) element2 = Elem[operand2, index, 16]:Zeros(16);
for e = 0 to elements-1
bits(32) element1 = Elem[operand1, 2*e+sel, 16]:Zeros(16);
bits(32) addend = Elem[operand3, e, 32];
Elem[result, e, 32] = BFMulAdd(addend, element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
BFloat16 floating-point widening multiply-add long (vector) widens the even-numbered (bottom) or odd-numbered
(top) 16-bit elements in the first and second source vectors from Bfloat16 to single-precision format. The instruction
then multiplies and adds these values without intermediate rounding to the single-precision elements of the
destination vector that overlap with the corresponding BFloat16 elements in the source vectors.
ID_AA64ISAR1_EL1.BF16 indicates whether this instruction is supported.
Vector
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 1 1 1 1 1 1 Rn Rd
Assembler Symbols
Q <bt>
0 B
1 T
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(128) operand1 = V[n];
bits(128) operand2 = V[m];
bits(128) operand3 = V[d];
bits(128) result;
for e = 0 to elements-1
bits(32) element1 = Elem[operand1, 2*e+sel, 16]:Zeros(16);
bits(32) element2 = Elem[operand2, 2*e+sel, 16]:Zeros(16);
bits(32) addend = Elem[operand3, e, 32];
Elem[result, e, 32] = BFMulAdd(addend, element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Note
Arm expects that the BFMMLA instruction will deliver a peak BFloat16 multiply throughput that is at least as high
as can be achieved using two BFDOT instructions, with a goal that it should have significantly higher throughput.
Vector
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 1 1 0 0 1 0 Rm 1 1 1 0 1 1 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(128) op1 = V[n];
bits(128) op2 = V[m];
bits(128) acc = V[d];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise bit Clear (vector, immediate). This instruction reads each vector element from the destination SIMD&FP
register, performs a bitwise AND between each result and the complement of an immediate constant, places the result
into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 0 0 0 0 a b c x x x 1 0 1 d e f g h Rd
op cmode
integer rd = UInt(Rd);
ImmediateOp operation;
case cmode:op of
when '0xx01' operation = ImmediateOp_MVNI;
when '0xx11' operation = ImmediateOp_BIC;
when '10x01' operation = ImmediateOp_MVNI;
when '10x11' operation = ImmediateOp_BIC;
when '110x1' operation = ImmediateOp_MVNI;
when '1110x' operation = ImmediateOp_MOVI;
when '11111'
// FMOV Dn,#imm is in main FP instruction set
if Q == '0' then UNDEFINED;
operation = ImmediateOp_MOVI;
Assembler Symbols
<Vd> Is the name of the SIMD&FP register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
Q <T>
0 2S
1 4S
<amount> For the 16-bit variant: is the shift amount encoded in “cmode<1>”:
cmode<2:1> <amount>
00 0
01 8
10 16
11 24
defaulting to 0 if LSL is omitted.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand;
bits(datasize) result;
case operation of
when ImmediateOp_MOVI
result = imm;
when ImmediateOp_MVNI
result = NOT(imm);
when ImmediateOp_ORR
operand = V[rd];
result = operand OR imm;
when ImmediateOp_BIC
operand = V[rd];
result = operand AND NOT(imm);
V[rd] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP
register and the complement of the second source SIMD&FP register, and writes the result to the destination
SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 Rm 0 0 0 1 1 1 Rn Rd
size
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 8B
1 16B
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
operand2 = NOT(operand2);
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise Insert if False. This instruction inserts each bit from the first source SIMD&FP register into the destination
SIMD&FP register if the corresponding bit of the second source SIMD&FP register is 0, otherwise leaves the bit in the
destination register unchanged.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 Rm 0 0 0 1 1 1 Rn Rd
opc2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 8B
1 16B
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1;
bits(datasize) operand3;
bits(datasize) operand4 = V[n];
operand1 = V[d];
operand3 = NOT(V[m]);
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise Insert if True. This instruction inserts each bit from the first source SIMD&FP register into the SIMD&FP
destination register if the corresponding bit of the second source SIMD&FP register is 1, otherwise leaves the bit in
the destination register unchanged.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 0 1 Rm 0 0 0 1 1 1 Rn Rd
opc2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 8B
1 16B
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1;
bits(datasize) operand3;
bits(datasize) operand4 = V[n];
operand1 = V[d];
operand3 = V[m];
V[d] = operand1 EOR ((operand1 EOR operand4) AND operand3);
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the
first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 Rm 0 0 0 1 1 1 Rn Rd
opc2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 8B
1 16B
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1;
bits(datasize) operand3;
bits(datasize) operand4 = V[n];
operand1 = V[m];
operand3 = V[d];
V[d] = operand1 EOR ((operand1 EOR operand4) AND operand3);
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant
bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the
result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most
significant bit itself.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer count;
for e = 0 to elements-1
if countop == CountOp_CLS then
count = CountLeadingSignBits(Elem[operand, e, esize]);
else
count = CountLeadingZeroBits(Elem[operand, e, esize]);
Elem[result, e, esize] = count<esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most
significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the
vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer count;
for e = 0 to elements-1
if countop == CountOp_CLS then
count = CountLeadingSignBits(Elem[operand, e, esize]);
else
count = CountLeadingZeroBits(Elem[operand, e, esize]);
Elem[result, e, esize] = count<esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP
register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is
equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets
every bit of the corresponding vector element in the destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 1 0 0 0 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size != '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean and_test = (U == '0');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 0 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean and_test = (U == '0');
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
boolean test_passed;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if and_test then
test_passed = !IsZero(element1 AND element2);
else
test_passed = (element1 == element2);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register
and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP
register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to
zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 1 0 0 1 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 1 0 0 1 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
Assembler Symbols
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean test_passed;
for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
case comparison of
when CompareOp_GT test_passed = element > 0;
when CompareOp_GE test_passed = element >= 0;
when CompareOp_EQ test_passed = element == 0;
when CompareOp_LE test_passed = element <= 0;
when CompareOp_LT test_passed = element < 0;
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source
SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first
signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding
vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector
element in the destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 0 1 1 1 1 Rn Rd
U eq
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size != '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
boolean cmp_eq = (eq == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 1 1 1 Rn Rd
U eq
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
boolean cmp_eq = (eq == '1');
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
boolean test_passed;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
test_passed = if cmp_eq then element1 >= element2 else element1 > element2;
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source
SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding
vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector
element in the destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 1 0 0 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 1 0 0 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
Assembler Symbols
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean test_passed;
for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
case comparison of
when CompareOp_GT test_passed = element > 0;
when CompareOp_GE test_passed = element >= 0;
when CompareOp_EQ test_passed = element == 0;
when CompareOp_LE test_passed = element <= 0;
when CompareOp_LT test_passed = element < 0;
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP
register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer
value is greater than the second signed integer value sets every bit of the corresponding vector element in the
destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination
SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 0 1 1 0 1 Rn Rd
U eq
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size != '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
boolean cmp_eq = (eq == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 1 0 1 Rn Rd
U eq
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
boolean cmp_eq = (eq == '1');
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
boolean test_passed;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
test_passed = if cmp_eq then element1 >= element2 else element1 > element2;
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP
register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the
destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination
SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 1 0 0 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 1 0 0 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
Assembler Symbols
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean test_passed;
for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
case comparison of
when CompareOp_GT test_passed = element > 0;
when CompareOp_GE test_passed = element >= 0;
when CompareOp_EQ test_passed = element == 0;
when CompareOp_LE test_passed = element <= 0;
when CompareOp_LT test_passed = element < 0;
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP
register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned
integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in
the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the
destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 0 1 1 0 1 Rn Rd
U eq
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size != '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
boolean cmp_eq = (eq == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 1 0 1 Rn Rd
U eq
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
boolean cmp_eq = (eq == '1');
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
boolean test_passed;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
test_passed = if cmp_eq then element1 >= element2 else element1 > element2;
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source
SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first
unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the
corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the
corresponding vector element in the destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 0 1 1 1 1 Rn Rd
U eq
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size != '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
boolean cmp_eq = (eq == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 1 1 1 Rn Rd
U eq
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
boolean cmp_eq = (eq == '1');
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
boolean test_passed;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
test_passed = if cmp_eq then element1 >= element2 else element1 > element2;
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source
SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding
vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector
element in the destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 1 0 0 1 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 1 0 0 1 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
Assembler Symbols
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean test_passed;
for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
case comparison of
when CompareOp_GT test_passed = element > 0;
when CompareOp_GE test_passed = element >= 0;
when CompareOp_EQ test_passed = element == 0;
when CompareOp_LE test_passed = element <= 0;
when CompareOp_LT test_passed = element < 0;
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register
and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination
SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP
register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 1 0 1 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 1 0 1 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean test_passed;
for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
case comparison of
when CompareOp_GT test_passed = element > 0;
when CompareOp_GE test_passed = element >= 0;
when CompareOp_EQ test_passed = element == 0;
when CompareOp_LE test_passed = element <= 0;
when CompareOp_LT test_passed = element < 0;
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP
register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the
result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one,
otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 0 0 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size != '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean and_test = (U == '0');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 0 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean and_test = (U == '0');
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
boolean test_passed;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if and_test then
test_passed = !IsZero(element1 AND element2);
else
test_passed = (element1 == element2);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element
in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 1 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 x RESERVED
1x x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer count;
for e = 0 to elements-1
count = BitCount(Elem[operand, e, esize]);
Elem[result, e, esize] = count<esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element
index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the
destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias MOV (scalar).
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 imm5 0 0 0 0 0 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 imm5 0 0 0 0 0 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<T> For the scalar variant: is the element width specifier, encoded in “imm5”:
imm5 <T>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
imm5 Q <T>
x0000 x RESERVED
xxxx1 0 8B
xxxx1 1 16B
xxx10 0 4H
xxx10 1 8H
xx100 0 2S
xx100 1 4S
x1000 0 RESERVED
x1000 1 2D
imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
imm5 <V>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
imm5 <index>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(idxdsize) operand = V[n];
bits(datasize) result;
bits(esize) element;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Duplicate general-purpose register to vector. This instruction duplicates the contents of the source general-purpose
register into a scalar or each element in a vector, and writes the result to the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 imm5 0 0 0 0 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
// imm5<4:size+1> is IGNORED
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
imm5 Q <T>
x0000 x RESERVED
xxxx1 0 8B
xxxx1 1 16B
xxx10 0 4H
xxx10 1 8H
xx100 0 2S
xx100 1 4S
x1000 0 RESERVED
x1000 1 2D
<R> Is the width specifier for the general-purpose source register, encoded in “imm5”:
imm5 <R>
x0000 RESERVED
xxxx1 W
xxx10 W
xx100 W
x1000 X
Unspecified bits in "imm5" are ignored but should be set to zero by an assembler.
<n> Is the number [0-30] of the general-purpose source register or ZR (31), encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(esize) element = X[n];
bits(datasize) result;
for e = 0 to elements-1
Elem[result, e, esize] = element;
V[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source
SIMD&FP registers, and places the result in the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 1 Rm 0 0 0 1 1 1 Rn Rd
opc2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 8B
1 16B
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1;
bits(datasize) operand2;
bits(datasize) operand3;
bits(datasize) operand4 = V[n];
operand1 = V[m];
operand2 = Zeros();
operand3 = Ones();
V[d] = operand1 EOR ((operand2 EOR operand4) AND operand3);
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and
writes the result to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA3 is implemented.
Advanced SIMD
(FEAT_SHA3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 0 0 Rm 0 Ra Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
<Va> Is the name of the third SIMD&FP source register, encoded in the "Ra" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Va = V[a];
V[d] = Vn EOR Vm EOR Va;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Extract vector from pair of vectors. This instruction extracts the lowest vector elements from the second source
SIMD&FP register and the highest vector elements from the first source SIMD&FP register, concatenates the results
into a vector, and writes the vector to the destination SIMD&FP register vector. The index value specifies the lowest
vector element to extract from the first source register, and consecutive elements are extracted from the first, then
second, source registers until the destination vector is filled.
The following figure shows an example of the operation of EXT doubleword operation for Q = 0 and imm4<2:0> = 3.
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
Vm Vn
Vd
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 0 Rm 0 imm4 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 8B
1 16B
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Q imm4<3> <index>
0 0 imm4<2:0>
0 1 RESERVED
1 x imm4
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) hi = V[m];
bits(datasize) lo = V[n];
bits(datasize*2) concat = hi:lo;
V[d] = concat<position+datasize-1:position>;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the
second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source
SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination
SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 0 Rm 0 0 0 1 0 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
boolean abs = TRUE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 Rm 1 1 0 1 0 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
boolean abs = TRUE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 0 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean abs = (U == '1');
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 0 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean abs = (U == '1');
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(esize) element1;
bits(esize) element2;
bits(esize) diff;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
diff = FPSub(element1, element2, fpcr);
Elem[result, e, esize] = if abs then FPAbs(diff) else diff;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Absolute value (scalar). This instruction calculates the absolute value in the SIMD&FP source register
and writes the result to the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 0 0 1 1 0 0 0 0 Rn Rd
opc
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the
source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
if neg then
element = FPNeg(element);
else
element = FPAbs(element);
Elem[result, e, esize] = element;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each
floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point
value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets
every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of
the corresponding vector element in the destination SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 0 Rm 0 0 1 0 1 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 Rm 1 1 1 0 1 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 1 0 1 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 1 0 1 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(esize) element1;
bits(esize) element2;
boolean test_passed;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[m] else Zeros();
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if abs then
element1 = FPAbs(element1);
element2 = FPAbs(element2);
case cmp of
when CompareOp_EQ test_passed = FPCompareEQ(element1, element2, fpcr);
when CompareOp_GE test_passed = FPCompareGE(element1, element2, fpcr);
when CompareOp_GT test_passed = FPCompareGT(element1, element2, fpcr);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector
element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the
second source SIMD&FP register and if the first value is greater than the second value sets every bit of the
corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the
corresponding vector element in the destination SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 0 Rm 0 0 1 0 1 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 Rm 1 1 1 0 1 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 1 0 1 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 1 0 1 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(esize) element1;
bits(esize) element2;
boolean test_passed;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[m] else Zeros();
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if abs then
element1 = FPAbs(element1);
element2 = FPAbs(element2);
case cmp of
when CompareOp_EQ test_passed = FPCompareEQ(element1, element2, fpcr);
when CompareOp_GE test_passed = FPCompareGE(element1, element2, fpcr);
when CompareOp_GT test_passed = FPCompareGT(element1, element2, fpcr);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Add (scalar). This instruction adds the floating-point values of the two source SIMD&FP registers, and
writes the result to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 1 0 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP
registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in
this instruction are floating-point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 0 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 0 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPAdd(element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Add Pair of elements (scalar). This instruction adds two floating-point vector elements in the source
SIMD&FP register and writes the scalar result into the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize = 16;
if sz == '1' then UNDEFINED;
integer datasize = 32;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<V> For the half-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 H
1 RESERVED
For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<T> For the half-precision variant: is the source arrangement specifier, encoded in “sz”:
For the single-precision and double-precision variant: is the source arrangement specifier, encoded in
“sz”:
sz <T>
0 2S
1 2D
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_FADD, operand, esize);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first
source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of
adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a
vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point
values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 0 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 0 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPAdd(element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Vector
(FEAT_FCMA)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 1 1 rot 0 1 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
rot <rotate>
0 90
1 270
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element3;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Conditional quiet Compare (scalar). This instruction compares the two SIMD&FP source register values
and writes the result to the PSTATE.{N, Z, C, V} flags. If the condition does not pass then the PSTATE.{N, Z, C, V}
flags are set to the flag bit specifier.
This instruction raises an Invalid Operation floating-point exception if either or both of the operands is a signaling
NaN.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm cond 0 1 Rn 0 nzcv
op
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize;
case ftype of
when '00' datasize = 32;
when '01' datasize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
datasize = 16;
else
UNDEFINED;
Assembler Symbols
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<nzcv> Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit
NZCV condition flags, encoded in the "nzcv" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
CheckFPAdvSIMDEnabled64();
operand2 = V[m];
if ConditionHolds(cond) then
flags = FPCompare(operand1, operand2, FALSE, FPCR[]);
PSTATE.<N,Z,C,V> = flags;
Operational information
The IEEE 754 standard specifies that the result of a comparison is precisely one of <, ==, > or unordered. If either or
both of the operands is a NaN, they are unordered, and all three of (Operand1 < Operand2), (Operand1 == Operand2)
and (Operand1 > Operand2) are false. An unordered comparison sets the PSTATE condition flags to N=0, Z=0, C=1,
and V=1.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Conditional signaling Compare (scalar). This instruction compares the two SIMD&FP source register
values and writes the result to the PSTATE.{N, Z, C, V} flags. If the condition does not pass then the PSTATE.{N, Z, C,
V} flags are set to the flag bit specifier.
This instruction raises an Invalid Operation floating-point exception if either or both of the operands is any type of
NaN.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm cond 0 1 Rn 1 nzcv
op
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize;
case ftype of
when '00' datasize = 32;
when '01' datasize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
datasize = 16;
else
UNDEFINED;
Assembler Symbols
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<nzcv> Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit
NZCV condition flags, encoded in the "nzcv" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
CheckFPAdvSIMDEnabled64();
operand2 = V[m];
if ConditionHolds(cond) then
flags = FPCompare(operand1, operand2, TRUE, FPCR[]);
PSTATE.<N,Z,C,V> = flags;
Operational information
The IEEE 754 standard specifies that the result of a comparison is precisely one of <, ==, > or unordered. If either or
both of the operands is a NaN, they are unordered, and all three of (Operand1 < Operand2), (Operand1 == Operand2)
and (Operand1 > Operand2) are false. An unordered comparison sets the PSTATE condition flags to N=0, Z=0, C=1,
and V=1.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source
SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the
comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one,
otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(esize) element1;
bits(esize) element2;
boolean test_passed;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[m] else Zeros();
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if abs then
element1 = FPAbs(element1);
element2 = FPAbs(element2);
case cmp of
when CompareOp_EQ test_passed = FPCompareEQ(element1, element2, fpcr);
when CompareOp_GE test_passed = FPCompareGE(element1, element2, fpcr);
when CompareOp_GT test_passed = FPCompareGT(element1, element2, fpcr);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP
register and if the value is equal to zero sets every bit of the corresponding vector element in the destination
SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP
register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 1 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 1 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) zero = FPZero('0');
bits(esize) element;
boolean test_passed;
for e = 0 to elements-1
element = Elem[operand, e, esize];
case comparison of
when CompareOp_GT test_passed = FPCompareGT(element, zero, FPCR[]);
when CompareOp_GE test_passed = FPCompareGE(element, zero, FPCR[]);
when CompareOp_EQ test_passed = FPCompareEQ(element, zero, FPCR[]);
when CompareOp_LE test_passed = FPCompareGE(zero, element, FPCR[]);
when CompareOp_LT test_passed = FPCompareGT(zero, element, FPCR[]);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first
source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the
second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP
register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to
zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(esize) element1;
bits(esize) element2;
boolean test_passed;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[m] else Zeros();
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if abs then
element1 = FPAbs(element1);
element2 = FPAbs(element2);
case cmp of
when CompareOp_EQ test_passed = FPCompareEQ(element1, element2, fpcr);
when CompareOp_GE test_passed = FPCompareGE(element1, element2, fpcr);
when CompareOp_GT test_passed = FPCompareGT(element1, element2, fpcr);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the
source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector
element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in
the destination SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) zero = FPZero('0');
bits(esize) element;
boolean test_passed;
for e = 0 to elements-1
element = Elem[operand, e, esize];
case comparison of
when CompareOp_GT test_passed = FPCompareGT(element, zero, FPCR[]);
when CompareOp_GE test_passed = FPCompareGE(element, zero, FPCR[]);
when CompareOp_EQ test_passed = FPCompareEQ(element, zero, FPCR[]);
when CompareOp_LE test_passed = FPCompareGE(zero, element, FPCR[]);
when CompareOp_LT test_passed = FPCompareGT(zero, element, FPCR[]);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source
SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source
SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one,
otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
CompareOp cmp;
boolean abs;
case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(esize) element1;
bits(esize) element2;
boolean test_passed;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[m] else Zeros();
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if abs then
element1 = FPAbs(element1);
element2 = FPAbs(element2);
case cmp of
when CompareOp_EQ test_passed = FPCompareEQ(element1, element2, fpcr);
when CompareOp_GE test_passed = FPCompareGE(element1, element2, fpcr);
when CompareOp_GT test_passed = FPCompareGT(element1, element2, fpcr);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source
SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the
destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination
SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) zero = FPZero('0');
bits(esize) element;
boolean test_passed;
for e = 0 to elements-1
element = Elem[operand, e, esize];
case comparison of
when CompareOp_GT test_passed = FPCompareGT(element, zero, FPCR[]);
when CompareOp_GE test_passed = FPCompareGE(element, zero, FPCR[]);
when CompareOp_EQ test_passed = FPCompareEQ(element, zero, FPCR[]);
when CompareOp_LE test_passed = FPCompareGE(zero, element, FPCR[]);
when CompareOp_LT test_passed = FPCompareGT(zero, element, FPCR[]);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Vector
(FEAT_FCMA)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 1 0 rot 1 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
bits(esize) element3;
bits(esize) element4;
FPCRType fpcr = FPCR[];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Vector
(FEAT_FCMA)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 rot 1 H 0 Rn Rd
(size == 01)
(size == 10)
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
size <index>
00 RESERVED
01 H:L
10 H
11 RESERVED
rot <rotate>
00 0
01 90
10 180
11 270
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
FPCRType fpcr = FPCR[];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the
source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector
element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in
the destination SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 1 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 1 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) zero = FPZero('0');
bits(esize) element;
boolean test_passed;
for e = 0 to elements-1
element = Elem[operand, e, esize];
case comparison of
when CompareOp_GT test_passed = FPCompareGT(element, zero, FPCR[]);
when CompareOp_GE test_passed = FPCompareGE(element, zero, FPCR[]);
when CompareOp_EQ test_passed = FPCompareEQ(element, zero, FPCR[]);
when CompareOp_LE test_passed = FPCompareGE(zero, element, FPCR[]);
when CompareOp_LT test_passed = FPCompareGT(zero, element, FPCR[]);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source
SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the
destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination
SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 1 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 1 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) zero = FPZero('0');
bits(esize) element;
boolean test_passed;
for e = 0 to elements-1
element = Elem[operand, e, esize];
case comparison of
when CompareOp_GT test_passed = FPCompareGT(element, zero, FPCR[]);
when CompareOp_GE test_passed = FPCompareGE(element, zero, FPCR[]);
when CompareOp_EQ test_passed = FPCompareEQ(element, zero, FPCR[]);
when CompareOp_LE test_passed = FPCompareGE(zero, element, FPCR[]);
when CompareOp_LT test_passed = FPCompareGT(zero, element, FPCR[]);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point quiet Compare (scalar). This instruction compares the two SIMD&FP source register values, or the first
SIMD&FP source register value and zero. It writes the result to the PSTATE.{N, Z, C, V} flags.
This instruction raises an Invalid Operation floating-point exception if either or both of the operands is a signaling
NaN.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 1 0 0 0 Rn 0 x 0 0 0
opc
integer n = UInt(Rn);
integer m = UInt(Rm); // ignored when opc<0> == '1'
integer datasize;
case ftype of
when '00' datasize = 32;
when '01' datasize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
datasize = 16;
else
UNDEFINED;
<Dn> For the double-precision variant: is the 64-bit name of the first SIMD&FP source register, encoded in
the "Rn" field.
For the double-precision, zero variant: is the 64-bit name of the SIMD&FP source register, encoded in
the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hn> For the half-precision variant: is the 16-bit name of the first SIMD&FP source register, encoded in the
"Rn" field.
For the half-precision, zero variant: is the 16-bit name of the SIMD&FP source register, encoded in the
"Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sn> For the single-precision variant: is the 32-bit name of the first SIMD&FP source register, encoded in the
"Rn" field.
For the single-precision, zero variant: is the 32-bit name of the SIMD&FP source register, encoded in
the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
Operational information
The IEEE 754 standard specifies that the result of a comparison is precisely one of <, ==, > or unordered. If either or
both of the operands is a NaN, they are unordered, and all three of (Operand1 < Operand2), (Operand1 == Operand2)
and (Operand1 > Operand2) are false. An unordered comparison sets the PSTATE condition flags to N=0, Z=0, C=1,
and V=1.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point signaling Compare (scalar). This instruction compares the two SIMD&FP source register values, or the
first SIMD&FP source register value and zero. It writes the result to the PSTATE.{N, Z, C, V} flags.
This instruction raises an Invalid Operation floating-point exception if either or both of the operands is any type of
NaN.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 1 0 0 0 Rn 1 x 0 0 0
opc
integer n = UInt(Rn);
integer m = UInt(Rm); // ignored when opc<0> == '1'
integer datasize;
case ftype of
when '00' datasize = 32;
when '01' datasize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
datasize = 16;
else
UNDEFINED;
<Dn> For the double-precision variant: is the 64-bit name of the first SIMD&FP source register, encoded in
the "Rn" field.
For the double-precision, zero variant: is the 64-bit name of the SIMD&FP source register, encoded in
the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hn> For the half-precision variant: is the 16-bit name of the first SIMD&FP source register, encoded in the
"Rn" field.
For the half-precision, zero variant: is the 16-bit name of the SIMD&FP source register, encoded in the
"Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sn> For the single-precision variant: is the 32-bit name of the first SIMD&FP source register, encoded in the
"Rn" field.
For the single-precision, zero variant: is the 32-bit name of the SIMD&FP source register, encoded in
the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
Operational information
The IEEE 754 standard specifies that the result of a comparison is precisely one of <, ==, > or unordered. If either or
both of the operands is a NaN, they are unordered, and all three of (Operand1 < Operand2), (Operand1 == Operand2)
and (Operand1 > Operand2) are false. An unordered comparison sets the PSTATE condition flags to N=0, Z=0, C=1,
and V=1.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Conditional Select (scalar). This instruction allows the SIMD&FP destination register to take the value
from either one or the other of two SIMD&FP source registers. If the condition passes, the first SIMD&FP source
register value is taken, otherwise the second SIMD&FP source register value is taken.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm cond 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize;
case ftype of
when '00' datasize = 32;
when '01' datasize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
datasize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
CheckFPAdvSIMDEnabled64();
bits(datasize) result;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert precision (scalar). This instruction converts the floating-point value in the SIMD&FP source
register to the precision for the destination register data type using the rounding mode that is determined by the
FPCR and writes the result to the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 1 opc 1 0 0 0 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer srcsize;
integer dstsize;
case ftype of
when '00' srcsize = 32;
when '01' srcsize = 64;
when '10' UNDEFINED;
when '11' srcsize = 16;
case opc of
when '00' dstsize = 32;
when '01' dstsize = 64;
when '10' UNDEFINED;
when '11' dstsize = 16;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
Operation
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round to Nearest
with Ties to Away rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 1 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
fltval = V[n];
intval = FPToFixed(fltval, 0, FALSE, fpcr, FPRounding_TIEAWAY);
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each
element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away
rounding mode and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar). This instruction converts
the floating-point value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round to
Nearest with Ties to Away rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 1 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
fltval = V[n];
intval = FPToFixed(fltval, 0, TRUE, fpcr, FPRounding_TIEAWAY);
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts
each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties
to Away rounding mode and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to higher precision Long (vector). This instruction reads each element in a vector in the
SIMD&FP source register, converts each value to double the precision of the source element using the rounding mode
that is determined by the FPCR, and writes each result to the equivalent element of the vector in the SIMD&FP
destination register.
Where the operation lengthens a 64-bit vector to a 128-bit vector, the FCVTL2 variant operates on the elements in the
top 64 bits of the source register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 0 1 1 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
sz <Ta>
0 4S
1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
sz Q <Tb>
0 0 4H
0 1 8H
1 0 2S
1 1 4S
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = Vpart[n, part];
bits(2*datasize) result;
for e = 0 to elements-1
Elem[result, e, 2*esize] = FPConvert(Elem[operand, e, esize], FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round towards
Minus Infinity rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 1 0 0 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
rounding = FPDecodeRounding(rmode);
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
fltval = V[n];
intval = FPToFixed(fltval, 0, FALSE, fpcr, rounding);
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or
each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity
rounding mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round towards
Minus Infinity rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 1 0 0 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
rounding = FPDecodeRounding(rmode);
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
fltval = V[n];
intval = FPToFixed(fltval, 0, TRUE, fpcr, rounding);
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar
or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus
Infinity rounding mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to lower precision Narrow (vector). This instruction reads each vector element in the SIMD&FP
source register, converts each result to half the precision of the source element, writes the final result to a vector, and
writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are
half as long as the source vector elements. The rounding mode is determined by the FPCR.
The FCVTN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
FCVTN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 0 1 1 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
sz Q <Tb>
0 0 4H
0 1 8H
1 0 2S
1 1 4S
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
sz <Ta>
0 4S
1 2D
CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand = V[n];
bits(datasize) result;
for e = 0 to elements-1
Elem[result, e, esize] = FPConvert(Elem[operand, e, 2*esize], FPCR[]);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round to Nearest
rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 0 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
rounding = FPDecodeRounding(rmode);
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
fltval = V[n];
intval = FPToFixed(fltval, 0, FALSE, fpcr, rounding);
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a
scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest
rounding mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar). This instruction converts
the floating-point value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round to
Nearest rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 0 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
rounding = FPDecodeRounding(rmode);
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
fltval = V[n];
intval = FPToFixed(fltval, 0, TRUE, fpcr, rounding);
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a
scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest
rounding mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar). This instruction converts the floating-
point value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round towards Plus Infinity
rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 1 0 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
rounding = FPDecodeRounding(rmode);
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
fltval = V[n];
intval = FPToFixed(fltval, 0, FALSE, fpcr, rounding);
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or
each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity
rounding mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round towards
Plus Infinity rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 1 0 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
rounding = FPDecodeRounding(rmode);
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
fltval = V[n];
intval = FPToFixed(fltval, 0, TRUE, fpcr, rounding);
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or
each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity
rounding mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element
in the source SIMD&FP register, narrows each value to half the precision of the source element using the Round to
Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
Note
This instruction uses the Round to Odd rounding mode which is not defined by the IEEE 754-2008 standard. This
rounding mode ensures that if the result of the conversion is inexact the least significant bit of the mantissa is
forced to 1. This rounding mode enables a floating-point value to be converted to a lower precision format via an
intermediate precision format while avoiding double rounding errors. For example, a 64-bit floating-point value
can be converted to a correctly rounded 16-bit floating-point value by first using this instruction to produce a
32-bit value and then using another instruction with the wanted rounding mode to convert the 32-bit value to the
final 16-bit floating-point value.
The FCVTXN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the FCVTXN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 0 1 1 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 0 1 1 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
sz Q <Tb>
0 x RESERVED
1 0 2S
1 1 4S
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
sz <Ta>
0 RESERVED
1 2D
sz <Vb>
0 RESERVED
1 S
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
sz <Va>
0 RESERVED
1 D
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
for e = 0 to elements-1
Elem[result, e, esize] = FPConvert(Elem[operand, e, 2*esize], fpcr, FPRounding_ODD);
if merge then
V[d] = result;
else
Vpart[d, part] = Elem[result, 0, datasize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar). This instruction converts the floating-
point value in the SIMD&FP source register to a 32-bit or 64-bit fixed-point signed integer using the Round towards
Zero rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 0 1 1 0 0 0 scale Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00' fltsize = 32;
when '01' fltsize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<fbits> For the double-precision to 32-bit, half-precision to 32-bit and single-precision to 32-bit variant: is the
number of bits after the binary point in the fixed-point destination, in the range 1 to 32, encoded as 64
minus "scale".
For the double-precision to 64-bit, half-precision to 64-bit and single-precision to 64-bit variant: is the
number of bits after the binary point in the fixed-point destination, in the range 1 to 64, encoded as 64
minus "scale".
Operation
CheckFPAdvSIMDEnabled64();
fltval = V[n];
intval = FPToFixed(fltval, fracbits, FALSE, fpcr, FPRounding_ZERO);
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Signed integer, rounding toward Zero (scalar). This instruction converts the floating-point
value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round towards Zero rounding
mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 1 1 0 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
rounding = FPDecodeRounding(rmode);
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
fltval = V[n];
intval = FPToFixed(fltval, 0, FALSE, fpcr, rounding);
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each
element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and
writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 1 1 1 1 1 1 Rn Rd
U immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 1 1 1 1 1 Rn Rd
U immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 x RESERVED
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width, encoded in
“immh:immb”:
immh <fbits>
000x RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
For the vector variant: is the number of fractional bits, in the range 1 to the element width, encoded in
“immh:immb”:
immh <fbits>
0000 SEE Advanced SIMD modified immediate
0001 RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, fracbits, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Signed integer, rounding toward Zero (vector). This instruction converts a scalar or each
element in a vector from a floating-point value to a signed integer value using the Round towards Zero rounding mode,
and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar). This instruction converts the floating-
point value in the SIMD&FP source register to a 32-bit or 64-bit fixed-point unsigned integer using the Round towards
Zero rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 0 1 1 0 0 1 scale Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00' fltsize = 32;
when '01' fltsize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<fbits> For the double-precision to 32-bit, half-precision to 32-bit and single-precision to 32-bit variant: is the
number of bits after the binary point in the fixed-point destination, in the range 1 to 32, encoded as 64
minus "scale".
For the double-precision to 64-bit, half-precision to 64-bit and single-precision to 64-bit variant: is the
number of bits after the binary point in the fixed-point destination, in the range 1 to 64, encoded as 64
minus "scale".
Operation
CheckFPAdvSIMDEnabled64();
fltval = V[n];
intval = FPToFixed(fltval, fracbits, TRUE, fpcr, FPRounding_ZERO);
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Unsigned integer, rounding toward Zero (scalar). This instruction converts the floating-point
value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round towards Zero rounding
mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 1 1 0 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
rounding = FPDecodeRounding(rmode);
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
fltval = V[n];
intval = FPToFixed(fltval, 0, TRUE, fpcr, rounding);
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or
each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding
mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 1 1 1 1 1 Rn Rd
U immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 1 1 1 1 1 Rn Rd
U immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 x RESERVED
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width, encoded in
“immh:immb”:
immh <fbits>
000x RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
For the vector variant: is the number of fractional bits, in the range 1 to the element width, encoded in
“immh:immb”:
immh <fbits>
0000 SEE Advanced SIMD modified immediate
0001 RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, fracbits, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Convert to Unsigned integer, rounding toward Zero (vector). This instruction converts a scalar or each
element in a vector from a floating-point value to an unsigned integer value using the Round towards Zero rounding
mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Divide (scalar). This instruction divides the floating-point value of the first source SIMD&FP register by
the floating-point value of the second source SIMD&FP register, and writes the result to the destination SIMD&FP
register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source
SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register,
places the results in a vector, and writes the vector to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 1 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 1 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPDiv(element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero. This instruction converts the double-
precision floating-point value in the SIMD&FP source register to a 32-bit signed integer using the Round towards Zero
rounding mode, and writes the result to the general-purpose destination register. If the result is too large to be
represented as a signed 32-bit integer, then the result is the integer modulo 232, as held in a 32-bit signed integer.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
Double-precision to 32-bit
(FEAT_JSCVT)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 Rn Rd
sf ftype rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bit Z;
fltval = V[n];
(intval, Z) = FPToFixedJS(fltval, fpcr, TRUE);
PSTATE.<N,Z,C,V> = '0':Z:'00';
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD&FP source
registers, adds the product to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP
destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 ftype 0 Rm 0 Ra Rn Rd
o1 o0
integer d = UInt(Rd);
integer a = UInt(Ra);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Dm> Is the 64-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Da> Is the 64-bit name of the third SIMD&FP source register holding the addend, encoded in the "Ra" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Hm> Is the 16-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Ha> Is the 16-bit name of the third SIMD&FP source register holding the addend, encoded in the "Ra" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
Operation
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Maximum (scalar). This instruction compares the two source SIMD&FP registers, and writes the larger
of the two floating-point values to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 1 0 0 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source
SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to
the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 1 1 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 1 1 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if minimum then
Elem[result, e, esize] = FPMin(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMax(element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Maximum Number (scalar). This instruction compares the first and second source SIMD&FP register
values, and writes the larger of the two floating-point values to the destination SIMD&FP register.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result that is placed in the vector is the numerical value, otherwise the result is identical to FMAX (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 1 1 0 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source
SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the
destination SIMD&FP register.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result placed in the vector is the numerical value, otherwise the result is identical to FMAX (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 0 0 0 1 Rn Rd
U a
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 0 0 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if minimum then
Elem[result, e, esize] = FPMinNum(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMaxNum(element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Maximum Number of Pair of elements (scalar). This instruction compares two vector elements in the
source SIMD&FP register and writes the largest of the floating-point values as a scalar to the destination SIMD&FP
register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<V> For the half-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 H
1 RESERVED
For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
sz <T>
0 2H
1 RESERVED
For the single-precision and double-precision variant: is the source arrangement specifier, encoded in
“sz”:
sz <T>
0 2S
1 2D
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_FMAXNUM, operand, esize, FALSE);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector
elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register,
reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of
values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are
floating-point values.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result is the numerical value, otherwise the result is identical to FMAX (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 0 0 0 1 Rn Rd
U a
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 0 0 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if minimum then
Elem[result, e, esize] = FPMinNum(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMaxNum(element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Maximum Number across Vector. This instruction compares all the vector elements in the source
SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values
in this instruction are floating-point values.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result of the comparison is the numerical value, otherwise the result is identical to FMAX (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 S
1 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:
Q sz <T>
0 x RESERVED
1 0 4S
1 1 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_FMAXNUM, operand, esize, FALSE);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Maximum of Pair of elements (scalar). This instruction compares two vector elements in the source
SIMD&FP register and writes the largest of the floating-point values as a scalar to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<V> For the half-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 H
1 RESERVED
For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
sz <T>
0 2H
1 RESERVED
For the single-precision and double-precision variant: is the source arrangement specifier, encoded in
“sz”:
sz <T>
0 2S
1 2D
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of
the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair
of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and
writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 1 1 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 1 1 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if minimum then
Elem[result, e, esize] = FPMin(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMax(element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP
register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this
instruction are floating-point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 S
1 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:
Q sz <T>
0 x RESERVED
1 0 4S
1 1 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Minimum (scalar). This instruction compares the first and second source SIMD&FP register values, and
writes the smaller of the two floating-point values to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 1 0 1 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source
SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to
the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 1 1 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 1 1 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if minimum then
Elem[result, e, esize] = FPMin(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMax(element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Minimum Number (scalar). This instruction compares the first and second source SIMD&FP register
values, and writes the smaller of the two floating-point values to the destination SIMD&FP register.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result that is placed in the vector is the numerical value, otherwise the result is identical to FMIN (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 1 1 1 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source
SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the
destination SIMD&FP register.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result placed in the vector is the numerical value, otherwise the result is identical to FMIN (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 0 0 0 1 Rn Rd
U a
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 0 0 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if minimum then
Elem[result, e, esize] = FPMinNum(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMaxNum(element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Minimum Number of Pair of elements (scalar). This instruction compares two vector elements in the
source SIMD&FP register and writes the smallest of the floating-point values as a scalar to the destination SIMD&FP
register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<V> For the half-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 H
1 RESERVED
For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
sz <T>
0 2H
1 RESERVED
For the single-precision and double-precision variant: is the source arrangement specifier, encoded in
“sz”:
sz <T>
0 2S
1 2D
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_FMINNUM, operand, esize, FALSE);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector
elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register,
reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of
floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this
instruction are floating-point values.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result is the numerical value, otherwise the result is identical to FMIN (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 0 0 0 1 Rn Rd
U a
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 0 0 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if minimum then
Elem[result, e, esize] = FPMinNum(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMaxNum(element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Minimum Number across Vector. This instruction compares all the vector elements in the source
SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the
values in this instruction are floating-point values.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result of the comparison is the numerical value, otherwise the result is identical to FMIN (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 S
1 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:
Q sz <T>
0 x RESERVED
1 0 4S
1 1 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_FMINNUM, operand, esize, FALSE);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Minimum of Pair of elements (scalar). This instruction compares two vector elements in the source
SIMD&FP register and writes the smallest of the floating-point values as a scalar to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<V> For the half-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 H
1 RESERVED
For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
sz <T>
0 2H
1 RESERVED
For the single-precision and double-precision variant: is the source arrangement specifier, encoded in
“sz”:
sz <T>
0 2S
1 2D
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of
the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair
of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and
writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 1 1 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 1 1 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if minimum then
Elem[result, e, esize] = FPMin(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMax(element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP
register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this
instruction are floating-point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:
sz <V>
0 S
1 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:
Q sz <T>
0 x RESERVED
1 0 4S
1 1 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point fused Multiply-Add to accumulator (by element). This instruction multiplies the vector elements in the
first source SIMD&FP register by the specified value in the second source SIMD&FP register, and accumulates the
results in the vector elements of the destination SIMD&FP register. All the values in this instruction are floating-point
values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar, half-precision , Scalar, single-precision and double-precision , Vector, half-
precision and Vector, single-precision and double-precision
Scalar, half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 0 L M Rm 0 0 0 1 H 0 Rn Rd
o2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 1 sz L M Rm 0 0 0 1 H 0 Rn Rd
o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 L M Rm 0 0 0 1 H 0 Rn Rd
o2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 sz L M Rm 0 0 0 1 H 0 Rn Rd
o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:
Q sz <T>
0 0 2S
0 1 RESERVED
1 0 4S
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> For the half-precision variant: is the name of the second SIMD&FP source register, in the range V0 to
V15, encoded in the "Rm" field.
For the single-precision and double-precision variant: is the name of the second SIMD&FP source
register, encoded in the "M:Rm" fields.
sz <Ts>
0 S
1 D
<index> For the half-precision variant: is the element index, in the range 0 to 7, encoded in the "H:L:M" fields.
For the single-precision and double-precision variant: is the element index, encoded in “sz:L:H”:
sz L <index>
0 x H:L
1 0 H
1 1 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(esize) element1;
bits(esize) element2 = Elem[operand2, index, esize];
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAdd(Elem[operand3, e, esize], element1, element2, fpcr);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point
values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of
the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 0 0 1 1 Rn Rd
a
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 0 0 1 1 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAdd(Elem[operand3, e, esize], element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point fused Multiply-Add Long to accumulator (by element). This instruction multiplies the vector elements in
the first source SIMD&FP register by the specified value in the second source SIMD&FP register, and accumulates the
product to the corresponding vector element of the destination SIMD&FP register. The instruction does not round the
result of the multiply before the accumulation.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.
Note
FMLAL
(FEAT_FHM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 0 L M Rm 0 0 0 0 H 0 Rn Rd
sz S
FMLAL2
(FEAT_FHM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 1 0 L M Rm 1 0 0 0 H 0 Rn Rd
sz S
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <Ta>
0 2S
1 4S
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Q <Tb>
0 2H
1 4H
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
<index> Is the element index, encoded in the "H:L:M" fields.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize DIV 2) operand1 = Vpart[n, part];
bits(128) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize DIV 2) element1;
bits(esize DIV 2) element2 = Elem[operand2, index, esize DIV 2];
for e = 0 to elements-1
element1 = Elem[operand1, e, esize DIV 2];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAddH(Elem[operand3, e, esize], element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point fused Multiply-Add Long to accumulator (vector). This instruction multiplies corresponding half-
precision floating-point values in the vectors in the two source SIMD&FP registers, and accumulates the product to
the corresponding vector element of the destination SIMD&FP register. The instruction does not round the result of
the multiply before the accumulation.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.
Note
FMLAL
(FEAT_FHM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 1 Rm 1 1 1 0 1 1 Rn Rd
S sz
FMLAL2
(FEAT_FHM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 1 Rm 1 1 0 0 1 1 Rn Rd
S sz
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <Ta>
0 2S
1 4S
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Q <Tb>
0 2H
1 4H
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize DIV 2) operand1 = Vpart[n, part];
bits(datasize DIV 2) operand2 = Vpart[m, part];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize DIV 2) element1;
bits(esize DIV 2) element2;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize DIV 2];
element2 = Elem[operand2, e, esize DIV 2];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAddH(Elem[operand3, e, esize], element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point fused Multiply-Subtract from accumulator (by element). This instruction multiplies the vector elements
in the first source SIMD&FP register by the specified value in the second source SIMD&FP register, and subtracts the
results from the vector elements of the destination SIMD&FP register. All the values in this instruction are floating-
point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar, half-precision , Scalar, single-precision and double-precision , Vector, half-
precision and Vector, single-precision and double-precision
Scalar, half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 0 L M Rm 0 1 0 1 H 0 Rn Rd
o2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 1 sz L M Rm 0 1 0 1 H 0 Rn Rd
o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 L M Rm 0 1 0 1 H 0 Rn Rd
o2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 sz L M Rm 0 1 0 1 H 0 Rn Rd
o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:
Q sz <T>
0 0 2S
0 1 RESERVED
1 0 4S
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> For the half-precision variant: is the name of the second SIMD&FP source register, in the range V0 to
V15, encoded in the "Rm" field.
For the single-precision and double-precision variant: is the name of the second SIMD&FP source
register, encoded in the "M:Rm" fields.
sz <Ts>
0 S
1 D
<index> For the half-precision variant: is the element index, in the range 0 to 7, encoded in the "H:L:M" fields.
For the single-precision and double-precision variant: is the element index, encoded in “sz:L:H”:
sz L <index>
0 x H:L
1 0 H
1 1 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(esize) element1;
bits(esize) element2 = Elem[operand2, index, esize];
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAdd(Elem[operand3, e, esize], element1, element2, fpcr);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-
point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the
corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP
register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 0 0 1 1 Rn Rd
a
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 0 0 1 1 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAdd(Elem[operand3, e, esize], element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point fused Multiply-Subtract Long from accumulator (by element). This instruction multiplies the negated
vector elements in the first source SIMD&FP register by the specified value in the second source SIMD&FP register,
and accumulates the product to the corresponding vector element of the destination SIMD&FP register. The
instruction does not round the result of the multiply before the accumulation.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.
Note
FMLSL
(FEAT_FHM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 0 L M Rm 0 1 0 0 H 0 Rn Rd
sz S
FMLSL2
(FEAT_FHM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 1 0 L M Rm 1 1 0 0 H 0 Rn Rd
sz S
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <Ta>
0 2S
1 4S
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Q <Tb>
0 2H
1 4H
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
<index> Is the element index, encoded in the "H:L:M" fields.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize DIV 2) operand1 = Vpart[n, part];
bits(128) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize DIV 2) element1;
bits(esize DIV 2) element2 = Elem[operand2, index, esize DIV 2];
for e = 0 to elements-1
element1 = Elem[operand1, e, esize DIV 2];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAddH(Elem[operand3, e, esize], element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point fused Multiply-Subtract Long from accumulator (vector). This instruction negates the values in the
vector of one SIMD&FP register, multiplies these with the corresponding values in another vector, and accumulates
the product to the corresponding vector element of the destination SIMD&FP register. The instruction does not round
the result of the multiply before the accumulation.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.
Note
FMLSL
(FEAT_FHM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 Rm 1 1 1 0 1 1 Rn Rd
S sz
FMLSL2
(FEAT_FHM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 0 1 Rm 1 1 0 0 1 1 Rn Rd
S sz
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <Ta>
0 2S
1 4S
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Q <Tb>
0 2H
1 4H
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize DIV 2) operand1 = Vpart[n, part];
bits(datasize DIV 2) operand2 = Vpart[m, part];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize DIV 2) element1;
bits(esize DIV 2) element2;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize DIV 2];
element2 = Elem[operand2, e, esize DIV 2];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAddH(Elem[operand3, e, esize], element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Move to or from general-purpose register without conversion. This instruction transfers the contents of
a SIMD&FP register to a general-purpose register, or the contents of a general-purpose register to a SIMD&FP
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 x 1 1 x 0 0 0 0 0 0 Rn Rd
rmode opcode
Half-precision to 64-bit (sf == 1 && ftype == 11 && rmode == 00 && opcode == 110)
(FEAT_FP16)
32-bit to half-precision (sf == 0 && ftype == 11 && rmode == 00 && opcode == 111)
(FEAT_FP16)
32-bit to single-precision (sf == 0 && ftype == 00 && rmode == 00 && opcode == 111)
Single-precision to 32-bit (sf == 0 && ftype == 00 && rmode == 00 && opcode == 110)
64-bit to half-precision (sf == 1 && ftype == 11 && rmode == 00 && opcode == 111)
(FEAT_FP16)
64-bit to double-precision (sf == 1 && ftype == 01 && rmode == 00 && opcode == 111)
64-bit to top half of 128-bit (sf == 1 && ftype == 10 && rmode == 01 && opcode == 111)
Double-precision to 64-bit (sf == 1 && ftype == 01 && rmode == 00 && opcode == 110)
Top half of 128-bit to 64-bit (sf == 1 && ftype == 10 && rmode == 01 && opcode == 110)
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
if opcode<2:1>:rmode != '11 01' then UNDEFINED;
fltsize = 128;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
case opcode<2:1>:rmode of
when '00 xx' // FCVT[NPMZ][US]
rounding = FPDecodeRounding(rmode);
unsigned = (opcode<0> == '1');
op = FPConvOp_CVT_FtoI;
when '01 00' // [US]CVTF
rounding = FPRoundingMode(FPCR[]);
unsigned = (opcode<0> == '1');
op = FPConvOp_CVT_ItoF;
when '10 00' // FCVTA[US]
rounding = FPRounding_TIEAWAY;
unsigned = (opcode<0> == '1');
op = FPConvOp_CVT_FtoI;
when '11 00' // FMOV
if fltsize != 16 && fltsize != intsize then UNDEFINED;
op = if opcode<0> == '1' then FPConvOp_MOV_ItoF else FPConvOp_MOV_FtoI;
part = 0;
when '11 01' // FMOV D[1]
if intsize != 64 || fltsize != 128 then UNDEFINED;
op = if opcode<0> == '1' then FPConvOp_MOV_ItoF else FPConvOp_MOV_FtoI;
part = 1;
fltsize = 64; // size of D[1] is 64
when '11 11' // FJCVTZS
if !HaveFJCVTZSExt() then UNDEFINED;
rounding = FPRounding_ZERO;
unsigned = (opcode<0> == '1');
op = FPConvOp_CVT_FtoI_JS;
otherwise
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
case op of
when FPConvOp_CVT_FtoI
fltval = V[n];
intval = FPToFixed(fltval, 0, unsigned, fpcr, rounding);
X[d] = intval;
when FPConvOp_CVT_ItoF
intval = X[n];
fltval = if merge then V[d] else Zeros();
Elem[fltval, 0, fltsize] = FixedToFP(intval, 0, unsigned, fpcr, rounding);
V[d] = fltval;
when FPConvOp_MOV_FtoI
fltval = Vpart[n, part];
intval = ZeroExtend(fltval, intsize);
X[d] = intval;
when FPConvOp_MOV_ItoF
intval = X[n];
fltval = intval<fsize-1:0>;
Vpart[d, part] = fltval;
when FPConvOp_CVT_FtoI_JS
bit Z;
fltval = V[n];
(intval, Z) = FPToFixedJS(fltval, fpcr, TRUE);
PSTATE.<N,Z,C,V> = '0':Z:'00';
X[d] = intval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Move register without conversion. This instruction copies the floating-point value in the SIMD&FP
source register to the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 0 0 0 1 0 0 0 0 Rn Rd
opc
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point move immediate (scalar). This instruction copies a floating-point immediate constant into the SIMD&FP
destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 imm8 1 0 0 0 0 0 0 0 Rd
integer d = UInt(Rd);
integer datasize;
case ftype of
when '00' datasize = 32;
when '01' datasize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
datasize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<imm> Is a signed floating-point constant with 3-bit exponent and normalized 4 bits of precision, encoded in
the "imm8" field. For details of the range of constants available and the encoding of <imm>, see
Modified immediate constants in A64 floating-point instructions.
Operation
CheckFPAdvSIMDEnabled64();
V[d] = imm;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point move immediate (vector). This instruction copies an immediate floating-point constant into every
element of the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 0 0 0 a b c 1 1 1 1 1 1 d e f g h Rd
integer rd = UInt(Rd);
imm8 = a:b:c:d:e:f:g:h;
imm16 = imm8<7>:NOT(imm8<6>):Replicate(imm8<6>, 2):imm8<5:0>:Zeros(6);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q op 0 1 1 1 1 0 0 0 0 0 a b c 1 1 1 1 0 1 d e f g h Rd
cmode
Single-precision (op == 0)
Double-precision (Q == 1 && op == 1)
integer rd = UInt(Rd);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 2S
1 4S
<imm> Is a signed floating-point constant with 3-bit exponent and normalized 4 bits of precision, encoded in
"a:b:c:d:e:f:g:h". For details of the range of constants available and the encoding of <imm>, see
Modified immediate constants in A64 floating-point instructions.
Operation
CheckFPAdvSIMDEnabled64();
V[rd] = imm;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two SIMD&FP source
registers, negates the product, adds that to the value of the third SIMD&FP source register, and writes the result to
the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 ftype 0 Rm 1 Ra Rn Rd
o1 o0
integer d = UInt(Rd);
integer a = UInt(Ra);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Dm> Is the 64-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Da> Is the 64-bit name of the third SIMD&FP source register holding the minuend, encoded in the "Ra"
field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Hm> Is the 16-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
Operation
CheckFPAdvSIMDEnabled64();
operand1 = FPNeg(operand1);
Elem[result, 0, esize] = FPMulAdd(operanda, operand1, operand2, fpcr);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Multiply (by element). This instruction multiplies the vector elements in the first source SIMD&FP
register by the specified value in the second source SIMD&FP register, places the results in a vector, and writes the
vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar, half-precision , Scalar, single-precision and double-precision , Vector, half-
precision and Vector, single-precision and double-precision
Scalar, half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 0 L M Rm 1 0 0 1 H 0 Rn Rd
U
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 1 sz L M Rm 1 0 0 1 H 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 L M Rm 1 0 0 1 H 0 Rn Rd
U
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 sz L M Rm 1 0 0 1 H 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:
Q sz <T>
0 0 2S
0 1 RESERVED
1 0 4S
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> For the half-precision variant: is the name of the second SIMD&FP source register, in the range V0 to
V15, encoded in the "Rm" field.
For the single-precision and double-precision variant: is the name of the second SIMD&FP source
register, encoded in the "M:Rm" fields.
sz <Ts>
0 S
1 D
<index> For the half-precision variant: is the element index, in the range 0 to 7, encoded in the "H:L:M" fields.
For the single-precision and double-precision variant: is the element index, encoded in “sz:L:H”:
sz L <index>
0 x H:L
1 0 H
1 1 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(esize) element1;
bits(esize) element2 = Elem[operand2, index, esize];
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
if mulx_op then
Elem[result, e, esize] = FPMulX(element1, element2, fpcr);
else
Elem[result, e, esize] = FPMul(element1, element2, fpcr);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Multiply (scalar). This instruction multiplies the floating-point values of the two source SIMD&FP
registers, and writes the result to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 0 0 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the
two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP
register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 0 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 0 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMul(element1, element2, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the
two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the
destination SIMD&FP register.
If one value is zero and the other value is infinite, the result is 2.0. In this case, the result is negative if only one of the
values is negative, otherwise the result is positive.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 0 Rm 0 0 0 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 Rm 1 1 0 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 0 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 0 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(esize) element1;
bits(esize) element2;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMulX(element1, element2, fpcr);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Multiply extended (by element). This instruction multiplies the floating-point values in the vector
elements in the first source SIMD&FP register by the specified floating-point value in the second source SIMD&FP
register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If one value is zero and the other value is infinite, the result is 2.0. In this case, the result is negative if only one of the
values is negative, otherwise the result is positive.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar, half-precision , Scalar, single-precision and double-precision , Vector, half-
precision and Vector, single-precision and double-precision
Scalar, half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 0 L M Rm 1 0 0 1 H 0 Rn Rd
U
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 1 sz L M Rm 1 0 0 1 H 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 0 L M Rm 1 0 0 1 H 0 Rn Rd
U
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 1 sz L M Rm 1 0 0 1 H 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:
Q sz <T>
0 0 2S
0 1 RESERVED
1 0 4S
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> For the half-precision variant: is the name of the second SIMD&FP source register, in the range V0 to
V15, encoded in the "Rm" field.
For the single-precision and double-precision variant: is the name of the second SIMD&FP source
register, encoded in the "M:Rm" fields.
sz <Ts>
0 S
1 D
<index> For the half-precision variant: is the element index, in the range 0 to 7, encoded in the "H:L:M" fields.
For the single-precision and double-precision variant: is the element index, encoded in “sz:L:H”:
sz L <index>
0 x H:L
1 0 H
1 1 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(esize) element1;
bits(esize) element2 = Elem[operand2, index, esize];
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
if mulx_op then
Elem[result, e, esize] = FPMulX(element1, element2, fpcr);
else
Elem[result, e, esize] = FPMul(element1, element2, fpcr);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Negate (scalar). This instruction negates the value in the SIMD&FP source register and writes the
result to the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 0 1 0 1 0 0 0 0 Rn Rd
opc
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP
register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
if neg then
element = FPNeg(element);
else
element = FPAbs(element);
Elem[result, e, esize] = element;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Negated fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD&FP
source registers, negates the product, subtracts the value of the third SIMD&FP source register, and writes the result
to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 ftype 1 Rm 0 Ra Rn Rd
o1 o0
integer d = UInt(Rd);
integer a = UInt(Ra);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Dm> Is the 64-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Da> Is the 64-bit name of the third SIMD&FP source register holding the addend, encoded in the "Ra" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Hm> Is the 16-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Ha> Is the 16-bit name of the third SIMD&FP source register holding the addend, encoded in the "Ra" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
Operation
CheckFPAdvSIMDEnabled64();
operanda = FPNeg(operanda);
operand1 = FPNeg(operand1);
Elem[result, 0, esize] = FPMulAdd(operanda, operand1, operand2, fpcr);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Negated fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two
SIMD&FP source registers, subtracts the value of the third SIMD&FP source register, and writes the result to the
destination SIMD&FP register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 ftype 1 Rm 1 Ra Rn Rd
o1 o0
integer d = UInt(Rd);
integer a = UInt(Ra);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Dm> Is the 64-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Da> Is the 64-bit name of the third SIMD&FP source register holding the minuend, encoded in the "Ra"
field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Hm> Is the 16-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
Operation
CheckFPAdvSIMDEnabled64();
operanda = FPNeg(operanda);
Elem[result, 0, esize] = FPMulAdd(operanda, operand1, operand2, fpcr);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Multiply-Negate (scalar). This instruction multiplies the floating-point values of the two source
SIMD&FP registers, and writes the negation of the result to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 1 0 0 0 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element
in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP
register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRecipEstimate(element, FPCR[]);
V[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the
two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a
vector, and writes the vector to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 0 Rm 0 0 1 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 Rm 1 1 1 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 1 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 1 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(esize) element1;
bits(esize) element2;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPRecipStepFused(element1, element2);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for the source
SIMD&FP register and writes the result to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 1 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(esize) operand = V[n];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to 32-bit Integer, using current rounding mode (scalar). This instruction rounds a floating-point
value in the SIMD&FP source register to an integral floating-point value that fits into a 32-bit integer size using the
rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When the result value is not numerically equal to the input
value, an Inexact exception is raised. When the input is infinite, NaN or out-of-range, the instruction returns {for the
corresponding result value} the most negative integer representable in the destination size, and an Invalid Operation
floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
Floating-point
(FEAT_FRINTTS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 x 1 0 1 0 0 0 1 1 0 0 0 0 Rn Rd
ftype op
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '1x' UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of
floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size
using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When one of the result values is not numerically equal to the
corresponding input value, an Inexact exception is raised. When an input is infinite, NaN or out-of-range, the
instruction returns for the corresponding result value the most negative integer representable in the destination size,
and an Invalid Operation floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 1 0 1 0 Rn Rd
U op
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundIntN(element, FPCR[], rounding, intsize);
V[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to 32-bit Integer toward Zero (scalar). This instruction rounds a floating-point value in the
SIMD&FP source register to an integral floating-point value that fits into a 32-bit integer size using the Round towards
Zero rounding mode, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When the result value is not numerically equal to the
{corresponding} input value, an Inexact exception is raised. When the input is infinite, NaN or out-of-range, the
instruction returns {for the corresponding result value} the most negative integer representable in the destination
size, and an Invalid Operation floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
Floating-point
(FEAT_FRINTTS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 x 1 0 1 0 0 0 0 1 0 0 0 0 Rn Rd
ftype op
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '1x' UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
V[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in
the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round
towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When one of the result values is not numerically equal to the
corresponding input value, an Inexact exception is raised. When an input is infinite, NaN or out-of-range, the
instruction returns for the corresponding result value the most negative integer representable in the destination size,
and an Invalid Operation floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 1 0 1 0 Rn Rd
U op
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundIntN(element, FPCR[], rounding, intsize);
V[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to 64-bit Integer, using current rounding mode (scalar). This instruction rounds a floating-point
value in the SIMD&FP source register to an integral floating-point value that fits into a 64-bit integer size using the
rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When the result value is not numerically equal to the input
value, an Inexact exception is raised. When the input is infinite, NaN or out-of-range, the instruction returns {for the
corresponding result value} the most negative integer representable in the destination size, and an Invalid Operation
floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
Floating-point
(FEAT_FRINTTS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 x 1 0 1 0 0 1 1 1 0 0 0 0 Rn Rd
ftype op
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '1x' UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of
floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size
using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When one of the result values is not numerically equal to the
corresponding input value, an Inexact exception is raised. When an input is infinite, NaN or out-of-range, the
instruction returns for the corresponding result value the most negative integer representable in the destination size,
and an Invalid Operation floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 1 1 1 0 Rn Rd
U op
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundIntN(element, FPCR[], rounding, intsize);
V[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to 64-bit Integer toward Zero (scalar). This instruction rounds a floating-point value in the
SIMD&FP source register to an integral floating-point value that fits into a 64-bit integer size using the Round towards
Zero rounding mode, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When the result value is not numerically equal to the
{corresponding} input value, an Inexact exception is raised. When the input is infinite, NaN or out-of-range, the
instruction returns {for the corresponding result value} the most negative integer representable in the destination
size, and an Invalid Operation floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
Floating-point
(FEAT_FRINTTS)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 x 1 0 1 0 0 1 0 1 0 0 0 0 Rn Rd
ftype op
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '1x' UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
V[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in
the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round
towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When one of the result values is not numerically equal to the
corresponding input value, an Inexact exception is raised. When an input is infinite, NaN or out-of-range, the
instruction returns for the corresponding result value the most negative integer representable in the destination size,
and an Invalid Operation floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 1 1 1 0 Rn Rd
U op
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundIntN(element, FPCR[], rounding, intsize);
V[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral, to nearest with ties to Away (scalar). This instruction rounds a floating-point value in
the SIMD&FP source register to an integral floating-point value of the same size using the Round to Nearest with Ties
to Away rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 1 0 0 1 0 0 0 0 Rn Rd
rmode
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-
point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to
Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral, using current rounding mode (scalar). This instruction rounds a floating-point value
in the SIMD&FP source register to an integral floating-point value of the same size using the rounding mode that is
determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 1 1 1 1 0 0 0 0 Rn Rd
rmode
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
FPRounding rounding;
rounding = FPRoundingMode(FPCR[]);
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-
point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding
mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral, toward Minus infinity (scalar). This instruction rounds a floating-point value in the
SIMD&FP source register to an integral floating-point value of the same size using the Round towards Minus Infinity
rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 0 1 0 1 0 0 0 0 Rn Rd
rmode
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
FPRounding rounding;
rounding = FPDecodeRounding('10');
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point
values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards
Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral, to nearest with ties to even (scalar). This instruction rounds a floating-point value in
the SIMD&FP source register to an integral floating-point value of the same size using the Round to Nearest rounding
mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 0 0 0 1 0 0 0 0 Rn Rd
rmode
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
FPRounding rounding;
rounding = FPDecodeRounding('00');
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point
values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest
rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral, toward Plus infinity (scalar). This instruction rounds a floating-point value in the
SIMD&FP source register to an integral floating-point value of the same size using the Round towards Plus Infinity
rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 0 0 1 1 0 0 0 0 Rn Rd
rmode
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
FPRounding rounding;
rounding = FPDecodeRounding('01');
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values
in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus
Infinity rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral exact, using current rounding mode (scalar). This instruction rounds a floating-point
value in the SIMD&FP source register to an integral floating-point value of the same size using the rounding mode
that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
When the result value is not numerically equal to the input value, an Inexact exception is raised. A zero input gives a
zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as
for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 1 1 0 1 0 0 0 0 Rn Rd
rmode
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
FPRounding rounding;
rounding = FPRoundingMode(FPCR[]);
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of
floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the
rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
When a result value is not numerically equal to the corresponding input value, an Inexact exception is raised. A zero
input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is
propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral, toward Zero (scalar). This instruction rounds a floating-point value in the SIMD&FP
source register to an integral floating-point value of the same size using the Round towards Zero rounding mode, and
writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 0 1 1 1 0 0 0 0 Rn Rd
rmode
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
FPRounding rounding;
rounding = FPDecodeRounding('11');
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the
SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding
mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each
vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination
SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRSqrtEstimate(element, fpcr);
V[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the
vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0,
places the results into a vector, and writes the vector to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 0 Rm 0 0 1 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 Rm 1 1 1 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 1 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 1 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(esize) element1;
bits(esize) element2;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPRSqrtStepFused(element1, element2);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Square Root (scalar). This instruction calculates the square root of the value in the SIMD&FP source
register and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 0 1 1 1 0 0 0 0 Rn Rd
opc
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source
SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 1 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPSqrt(element, FPCR[]);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Subtract (scalar). This instruction subtracts the floating-point value of the second source SIMD&FP
register from the floating-point value of the first source SIMD&FP register, and writes the result to the destination
SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 1 1 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;
Assembler Symbols
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP
register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into
elements of a vector, and writes the vector to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision
Half-precision
(FEAT_FP16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 0 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean abs = (U == '1');
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 0 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean abs = (U == '1');
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(esize) element1;
bits(esize) element2;
bits(esize) diff;
FPCRType fpcr = FPCR[];
bits(datasize) result;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
diff = FPSub(element1, element2, fpcr);
Elem[result, e, esize] = if abs then FPAbs(diff) else diff;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP
register to the specified vector element of the destination SIMD&FP register.
This instruction can insert data into individual elements within a SIMD&FP register without clearing the remaining
bits to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias MOV (element).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 1 1 0 0 0 0 imm5 0 imm4 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
imm5 <index1>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
imm5 <index2>
x0000 RESERVED
xxxx1 imm4<3:0>
xxx10 imm4<3:1>
xx100 imm4<3:2>
x1000 imm4<3>
Unspecified bits in "imm4" are ignored but should be set to zero by an assembler.
CheckFPAdvSIMDEnabled64();
bits(idxdsize) operand = V[n];
bits(128) result;
result = V[d];
Elem[result, dst_index, esize] = Elem[operand, src_index, esize];
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Insert vector element from general-purpose register. This instruction copies the contents of the source general-
purpose register to the specified vector element in the destination SIMD&FP register.
This instruction can insert data into individual elements within a SIMD&FP register without clearing the remaining
bits to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias MOV (from general).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 0 imm5 0 0 0 1 1 1 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
imm5 <index>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>
<R> Is the width specifier for the general-purpose source register, encoded in “imm5”:
imm5 <R>
x0000 RESERVED
xxxx1 W
xxx10 W
xx100 W
x1000 X
<n> Is the number [0-30] of the general-purpose source register or ZR (31), encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(esize) element = X[n];
bits(128) result;
result = V[d];
Elem[result, index, esize] = element;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load multiple single-element structures to one, two, three, or four registers. This instruction loads multiple single-
element structures from memory and writes the result to one, two, three, or four SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 1 0 0 0 0 0 0 x x 1 x size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 1 0 Rm x x 1 x size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<Vt4> Is the name of the fourth SIMD&FP register to be transferred, encoded as "Rt" plus 3 modulo 32.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Q <imm>
0 #8
1 #16
For the two registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:
Q <imm>
0 #16
1 #32
For the three registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:
Q <imm>
0 #24
1 #48
For the four registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:
Q <imm>
0 #32
1 #64
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
Shared Decode
case opcode of
when '0000' rpt = 1; selem = 4; // LD/ST4 (4 registers)
when '0010' rpt = 4; selem = 1; // LD/ST1 (4 registers)
when '0100' rpt = 1; selem = 3; // LD/ST3 (3 registers)
when '0110' rpt = 3; selem = 1; // LD/ST1 (3 registers)
when '0111' rpt = 1; selem = 1; // LD/ST1 (1 register)
when '1000' rpt = 1; selem = 2; // LD/ST2 (2 registers)
when '1010' rpt = 2; selem = 1; // LD/ST1 (2 registers)
otherwise UNDEFINED;
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
for e = 0 to elements-1
tt = (t + r) MOD 32;
for s = 0 to selem-1
rval = V[tt];
if memop == MemOp_LOAD then
Elem[rval, e, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[tt] = rval;
else // memop == MemOp_STORE
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, e, esize];
offs = offs + ebytes;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load one single-element structure to one lane of one register. This instruction loads a single-element structure from
memory and writes the result to the specified lane of the SIMD&FP register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 0 0 0 0 0 0 x x 0 S size Rn Rt
L R opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 0 Rm x x 0 S size Rn Rt
L R opcode
16-bit, immediate offset (Rm == 11111 && opcode == 010 && size == x0)
16-bit, register offset (Rm != 11111 && opcode == 010 && size == x0)
32-bit, immediate offset (Rm == 11111 && opcode == 100 && size == 00)
32-bit, register offset (Rm != 11111 && opcode == 100 && size == 00)
64-bit, immediate offset (Rm == 11111 && opcode == 100 && S == 0 && size == 01)
64-bit, register offset (Rm != 11111 && opcode == 100 && S == 0 && size == 01)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<1> == '1' then UNDEFINED;
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
// load and replicate to all elements
for s = 0 to selem-1
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
offs = offs + ebytes;
t = (t + 1) MOD 32;
else
// load/store one element per register
for s = 0 to selem-1
rval = V[t];
if memop == MemOp_LOAD then
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
else // memop == MemOp_STORE
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
offs = offs + ebytes;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load one single-element structure and Replicate to all lanes (of one register). This instruction loads a single-element
structure from memory and replicates the structure to all the lanes of the SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 size Rn Rt
L R opcode S
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 0 Rm 1 1 0 0 size Rn Rt
L R opcode S
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
size <imm>
00 #1
01 #2
10 #4
11 #8
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
Shared Decode
case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<1> == '1' then UNDEFINED;
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
// load and replicate to all elements
for s = 0 to selem-1
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
offs = offs + ebytes;
t = (t + 1) MOD 32;
else
// load/store one element per register
for s = 0 to selem-1
rval = V[t];
if memop == MemOp_LOAD then
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
else // memop == MemOp_STORE
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
offs = offs + ebytes;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load multiple 2-element structures to two registers. This instruction loads multiple 2-element structures from memory
and writes the result to the two SIMD&FP registers, with de-interleaving.
For an example of de-interleaving, see LD3 (multiple structures).
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 1 0 Rm 1 0 0 0 size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
Q <imm>
0 #16
1 #32
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
Shared Decode
case opcode of
when '0000' rpt = 1; selem = 4; // LD/ST4 (4 registers)
when '0010' rpt = 4; selem = 1; // LD/ST1 (4 registers)
when '0100' rpt = 1; selem = 3; // LD/ST3 (3 registers)
when '0110' rpt = 3; selem = 1; // LD/ST1 (3 registers)
when '0111' rpt = 1; selem = 1; // LD/ST1 (1 register)
when '1000' rpt = 1; selem = 2; // LD/ST2 (2 registers)
when '1010' rpt = 2; selem = 1; // LD/ST1 (2 registers)
otherwise UNDEFINED;
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
for e = 0 to elements-1
tt = (t + r) MOD 32;
for s = 0 to selem-1
rval = V[tt];
if memop == MemOp_LOAD then
Elem[rval, e, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[tt] = rval;
else // memop == MemOp_STORE
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, e, esize];
offs = offs + ebytes;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load single 2-element structure to one lane of two registers. This instruction loads a 2-element structure from memory
and writes the result to the corresponding elements of the two SIMD&FP registers without affecting the other bits of
the registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 1 0 0 0 0 0 x x 0 S size Rn Rt
L R opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 1 Rm x x 0 S size Rn Rt
L R opcode
16-bit, immediate offset (Rm == 11111 && opcode == 010 && size == x0)
16-bit, register offset (Rm != 11111 && opcode == 010 && size == x0)
32-bit, immediate offset (Rm == 11111 && opcode == 100 && size == 00)
32-bit, register offset (Rm != 11111 && opcode == 100 && size == 00)
64-bit, immediate offset (Rm == 11111 && opcode == 100 && S == 0 && size == 01)
64-bit, register offset (Rm != 11111 && opcode == 100 && S == 0 && size == 01)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<1> == '1' then UNDEFINED;
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
// load and replicate to all elements
for s = 0 to selem-1
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
offs = offs + ebytes;
t = (t + 1) MOD 32;
else
// load/store one element per register
for s = 0 to selem-1
rval = V[t];
if memop == MemOp_LOAD then
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
else // memop == MemOp_STORE
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
offs = offs + ebytes;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load single 2-element structure and Replicate to all lanes of two registers. This instruction loads a 2-element structure
from memory and replicates the structure to all the lanes of the two SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 1 0 0 0 0 0 1 1 0 0 size Rn Rt
L R opcode S
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 1 Rm 1 1 0 0 size Rn Rt
L R opcode S
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
size <imm>
00 #2
01 #4
10 #8
11 #16
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
Shared Decode
case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<1> == '1' then UNDEFINED;
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
// load and replicate to all elements
for s = 0 to selem-1
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
offs = offs + ebytes;
t = (t + 1) MOD 32;
else
// load/store one element per register
for s = 0 to selem-1
rval = V[t];
if memop == MemOp_LOAD then
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
else // memop == MemOp_STORE
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
offs = offs + ebytes;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load multiple 3-element structures to three registers. This instruction loads multiple 3-element structures from
memory and writes the result to the three SIMD&FP registers, with de-interleaving.
The following figure shows an example of the operation of de-interleaving of a LD3.16 (multiple 3-element structures)
instruction:.
Memory
A[0].x
A[0].y
A[0].z
A[1].x
A is a packed array of A[1].y
3-element structures. A[1].z
Each element is a 16-bitA[2].x
halfword. A[2].y
A[2].z
A[3].x
A[3].y X3 X2 X1 X0 D0
A[3].z Y3 Y2 Y1 Y0 D1 Registers
Z3 Z2 Z1 Z0 D2
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 1 0 Rm 0 1 0 0 size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Q <imm>
0 #24
1 #48
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
Shared Decode
case opcode of
when '0000' rpt = 1; selem = 4; // LD/ST4 (4 registers)
when '0010' rpt = 4; selem = 1; // LD/ST1 (4 registers)
when '0100' rpt = 1; selem = 3; // LD/ST3 (3 registers)
when '0110' rpt = 3; selem = 1; // LD/ST1 (3 registers)
when '0111' rpt = 1; selem = 1; // LD/ST1 (1 register)
when '1000' rpt = 1; selem = 2; // LD/ST2 (2 registers)
when '1010' rpt = 2; selem = 1; // LD/ST1 (2 registers)
otherwise UNDEFINED;
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
for e = 0 to elements-1
tt = (t + r) MOD 32;
for s = 0 to selem-1
rval = V[tt];
if memop == MemOp_LOAD then
Elem[rval, e, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[tt] = rval;
else // memop == MemOp_STORE
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, e, esize];
offs = offs + ebytes;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load single 3-element structure to one lane of three registers. This instruction loads a 3-element structure from
memory and writes the result to the corresponding elements of the three SIMD&FP registers without affecting the
other bits of the registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 0 0 0 0 0 0 x x 1 S size Rn Rt
L R opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 0 Rm x x 1 S size Rn Rt
L R opcode
16-bit, immediate offset (Rm == 11111 && opcode == 011 && size == x0)
16-bit, register offset (Rm != 11111 && opcode == 011 && size == x0)
32-bit, immediate offset (Rm == 11111 && opcode == 101 && size == 00)
32-bit, register offset (Rm != 11111 && opcode == 101 && size == 00)
64-bit, immediate offset (Rm == 11111 && opcode == 101 && S == 0 && size == 01)
64-bit, register offset (Rm != 11111 && opcode == 101 && S == 0 && size == 01)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<1> == '1' then UNDEFINED;
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
// load and replicate to all elements
for s = 0 to selem-1
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
offs = offs + ebytes;
t = (t + 1) MOD 32;
else
// load/store one element per register
for s = 0 to selem-1
rval = V[t];
if memop == MemOp_LOAD then
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
else // memop == MemOp_STORE
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
offs = offs + ebytes;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load single 3-element structure and Replicate to all lanes of three registers. This instruction loads a 3-element
structure from memory and replicates the structure to all the lanes of the three SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 size Rn Rt
L R opcode S
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 0 Rm 1 1 1 0 size Rn Rt
L R opcode S
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
size <imm>
00 #3
01 #6
10 #12
11 #24
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
Shared Decode
case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<1> == '1' then UNDEFINED;
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
// load and replicate to all elements
for s = 0 to selem-1
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
offs = offs + ebytes;
t = (t + 1) MOD 32;
else
// load/store one element per register
for s = 0 to selem-1
rval = V[t];
if memop == MemOp_LOAD then
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
else // memop == MemOp_STORE
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
offs = offs + ebytes;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load multiple 4-element structures to four registers. This instruction loads multiple 4-element structures from
memory and writes the result to the four SIMD&FP registers, with de-interleaving.
For an example of de-interleaving, see LD3 (multiple structures).
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 1 0 Rm 0 0 0 0 size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
Q <imm>
0 #32
1 #64
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
Shared Decode
case opcode of
when '0000' rpt = 1; selem = 4; // LD/ST4 (4 registers)
when '0010' rpt = 4; selem = 1; // LD/ST1 (4 registers)
when '0100' rpt = 1; selem = 3; // LD/ST3 (3 registers)
when '0110' rpt = 3; selem = 1; // LD/ST1 (3 registers)
when '0111' rpt = 1; selem = 1; // LD/ST1 (1 register)
when '1000' rpt = 1; selem = 2; // LD/ST2 (2 registers)
when '1010' rpt = 2; selem = 1; // LD/ST1 (2 registers)
otherwise UNDEFINED;
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
for e = 0 to elements-1
tt = (t + r) MOD 32;
for s = 0 to selem-1
rval = V[tt];
if memop == MemOp_LOAD then
Elem[rval, e, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[tt] = rval;
else // memop == MemOp_STORE
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, e, esize];
offs = offs + ebytes;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load single 4-element structure to one lane of four registers. This instruction loads a 4-element structure from
memory and writes the result to the corresponding elements of the four SIMD&FP registers without affecting the
other bits of the registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 1 0 0 0 0 0 x x 1 S size Rn Rt
L R opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 1 Rm x x 1 S size Rn Rt
L R opcode
16-bit, immediate offset (Rm == 11111 && opcode == 011 && size == x0)
16-bit, register offset (Rm != 11111 && opcode == 011 && size == x0)
32-bit, immediate offset (Rm == 11111 && opcode == 101 && size == 00)
32-bit, register offset (Rm != 11111 && opcode == 101 && size == 00)
64-bit, immediate offset (Rm == 11111 && opcode == 101 && S == 0 && size == 01)
64-bit, register offset (Rm != 11111 && opcode == 101 && S == 0 && size == 01)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<Vt4> Is the name of the fourth SIMD&FP register to be transferred, encoded as "Rt" plus 3 modulo 32.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<1> == '1' then UNDEFINED;
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
// load and replicate to all elements
for s = 0 to selem-1
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
offs = offs + ebytes;
t = (t + 1) MOD 32;
else
// load/store one element per register
for s = 0 to selem-1
rval = V[t];
if memop == MemOp_LOAD then
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
else // memop == MemOp_STORE
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
offs = offs + ebytes;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load single 4-element structure and Replicate to all lanes of four registers. This instruction loads a 4-element
structure from memory and replicates the structure to all the lanes of the four SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 1 0 0 0 0 0 1 1 1 0 size Rn Rt
L R opcode S
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 1 Rm 1 1 1 0 size Rn Rt
L R opcode S
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
size <imm>
00 #4
01 #8
10 #16
11 #32
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
Shared Decode
case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<1> == '1' then UNDEFINED;
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
// load and replicate to all elements
for s = 0 to selem-1
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
offs = offs + ebytes;
t = (t + 1) MOD 32;
else
// load/store one element per register
for s = 0 to selem-1
rval = V[t];
if memop == MemOp_LOAD then
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
else // memop == MemOp_STORE
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
offs = offs + ebytes;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Pair of SIMD&FP registers, with Non-temporal hint. This instruction loads a pair of SIMD&FP registers from
memory, issuing a hint to the memory system that the access is non-temporal. The address that is used for the load is
calculated from a base register value and an optional immediate offset.
For information about non-temporal pair instructions, see Load/Store SIMD and Floating-point Non-temporal pair.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 0 0 1 imm7 Rt2 Rn Rt
L
// Empty.
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDNP (SIMD&FP).
Assembler Symbols
<Dt1> Is the 64-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt2> Is the 64-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<Qt1> Is the 128-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt2> Is the 128-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<St1> Is the 32-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<St2> Is the 32-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the 32-bit variant: is the optional signed immediate byte offset, a multiple of 4 in the range -256 to
252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit variant: is the optional signed immediate byte offset, a multiple of 8 in the range -512 to
504, defaulting to 0 and encoded in the "imm7" field as <imm>/8.
For the 128-bit variant: is the optional signed immediate byte offset, a multiple of 16 in the range -1024
to 1008, defaulting to 0 and encoded in the "imm7" field as <imm>/16.
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if opc == '11' then UNDEFINED;
integer scale = 2 + UInt(opc);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = n != 31;
if t == t2 then
Constraint c = ConstrainUnpredictable(Unpredictable_LDPOVERLAP);
assert c IN {Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_UNKNOWN rt_unknown = TRUE; // result is UNKNOWN
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();
Operation
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load Pair of SIMD&FP registers. This instruction loads a pair of SIMD&FP registers from memory. The address that is
used for the load is calculated from a base register value and an optional immediate offset.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 0 1 1 imm7 Rt2 Rn Rt
L
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 1 1 1 imm7 Rt2 Rn Rt
L
Signed offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 1 0 1 imm7 Rt2 Rn Rt
L
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDP (SIMD&FP).
Assembler Symbols
<Dt1> Is the 64-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt2> Is the 64-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<Qt1> Is the 128-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt2> Is the 128-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<St1> Is the 32-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<St2> Is the 32-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the 32-bit post-index and 32-bit pre-index variant: is the signed immediate byte offset, a multiple of
4 in the range -256 to 252, encoded in the "imm7" field as <imm>/4.
For the 32-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 4 in the
range -256 to 252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit post-index and 64-bit pre-index variant: is the signed immediate byte offset, a multiple of
8 in the range -512 to 504, encoded in the "imm7" field as <imm>/8.
For the 64-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 8 in the
range -512 to 504, defaulting to 0 and encoded in the "imm7" field as <imm>/8.
For the 128-bit post-index and 128-bit pre-index variant: is the signed immediate byte offset, a multiple
of 16 in the range -1024 to 1008, encoded in the "imm7" field as <imm>/16.
For the 128-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 16 in the
range -1024 to 1008, defaulting to 0 and encoded in the "imm7" field as <imm>/16.
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if opc == '11' then UNDEFINED;
integer scale = 2 + UInt(opc);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = wback || n != 31;
if t == t2 then
Constraint c = ConstrainUnpredictable(Unpredictable_LDPOVERLAP);
assert c IN {Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_UNKNOWN rt_unknown = TRUE; // result is UNKNOWN
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();
Operation
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load SIMD&FP Register (immediate offset). This instruction loads an element from memory, and writes the result as a
scalar to the SIMD&FP register. The address that is used for the load is calculated from a base register value, a signed
immediate offset, and an optional offset that is a multiple of the element size.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 1 0 imm9 0 1 Rn Rt
opc
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 1 0 imm9 1 1 Rn Rt
opc
Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 1 x 1 imm12 Rn Rt
opc
<Bt> Is the 8-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt> Is the 64-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Ht> Is the 16-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt> Is the 128-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<St> Is the 32-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> For the 8-bit variant: is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to
0 and encoded in the "imm12" field.
For the 16-bit variant: is the optional positive immediate byte offset, a multiple of 2 in the range 0 to
8190, defaulting to 0 and encoded in the "imm12" field as <pimm>/2.
For the 32-bit variant: is the optional positive immediate byte offset, a multiple of 4 in the range 0 to
16380, defaulting to 0 and encoded in the "imm12" field as <pimm>/4.
For the 64-bit variant: is the optional positive immediate byte offset, a multiple of 8 in the range 0 to
32760, defaulting to 0 and encoded in the "imm12" field as <pimm>/8.
For the 128-bit variant: is the optional positive immediate byte offset, a multiple of 16 in the range 0 to
65520, defaulting to 0 and encoded in the "imm12" field as <pimm>/16.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop = if opc<0> == '1' then MemOp_LOAD else MemOp_STORE;
integer datasize = 8 << scale;
boolean tag_checked = memop != MemOp_PREFETCH && (wback || n != 31);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
case memop of
when MemOp_STORE
data = V[t];
Mem[address, datasize DIV 8, AccType_VEC] = data;
when MemOp_LOAD
data = Mem[address, datasize DIV 8, AccType_VEC];
V[t] = data;
if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load SIMD&FP Register (PC-relative literal). This instruction loads a SIMD&FP register from memory. The address
that is used for the load is calculated from the PC value and an immediate offset.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 0 1 1 1 0 0 imm19 Rt
integer t = UInt(Rt);
integer size;
bits(64) offset;
case opc of
when '00'
size = 4;
when '01'
size = 8;
when '10'
size = 16;
when '11'
UNDEFINED;
Assembler Symbols
<Dt> Is the 64-bit name of the SIMD&FP register to be loaded, encoded in the "Rt" field.
<Qt> Is the 128-bit name of the SIMD&FP register to be loaded, encoded in the "Rt" field.
<St> Is the 32-bit name of the SIMD&FP register to be loaded, encoded in the "Rt" field.
<label> Is the program label from which the data is to be loaded. Its offset from the address of this instruction,
in the range +/-1MB, is encoded as "imm19" times 4.
Operation
if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);
CheckFPAdvSIMDEnabled64();
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load SIMD&FP Register (register offset). This instruction loads a SIMD&FP register from memory. The address that is
used for the load is calculated from a base register value and an offset register value. The offset can be optionally
shifted and extended.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 1 1 Rm option S 1 0 Rn Rt
opc
Assembler Symbols
<Bt> Is the 8-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt> Is the 64-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Ht> Is the 16-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt> Is the 128-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<St> Is the 32-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.
option <extend>
010 UXTW
110 SXTW
111 SXTX
For the 128-bit, 16-bit, 32-bit and 64-bit variant: is the index extend/shift specifier, defaulting to LSL,
and which must be omitted for the LSL option when <amount> is omitted. encoded in “option”:
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
<amount> For the 8-bit variant: is the index shift amount, it must be #0, encoded in "S" as 0 if omitted, or as 1 if
present.
For the 16-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #1
For the 32-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #2
For the 64-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #3
For the 128-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #4
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
MemOp memop = if opc<0> == '1' then MemOp_LOAD else MemOp_STORE;
integer datasize = 8 << scale;
boolean tag_checked = memop != MemOp_PREFETCH;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = V[t];
Mem[address, datasize DIV 8, AccType_VEC] = data;
when MemOp_LOAD
data = Mem[address, datasize DIV 8, AccType_VEC];
V[t] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load SIMD&FP Register (unscaled offset). This instruction loads a SIMD&FP register from memory. The address that
is used for the load is calculated from a base register value and an optional immediate offset.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 1 0 imm9 0 0 Rn Rt
opc
Assembler Symbols
<Bt> Is the 8-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt> Is the 64-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Ht> Is the 16-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt> Is the 128-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<St> Is the 32-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop = if opc<0> == '1' then MemOp_LOAD else MemOp_STORE;
integer datasize = 8 << scale;
boolean tag_checked = memop != MemOp_PREFETCH && (n != 31);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = V[t];
Mem[address, datasize DIV 8, AccType_VEC] = data;
when MemOp_LOAD
data = Mem[address, datasize DIV 8, AccType_VEC];
V[t] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply-Add to accumulator (vector, by element). This instruction multiplies the vector elements in the first source
SIMD&FP register by the specified value in the second source SIMD&FP register, and accumulates the results with
the vector elements of the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 0 0 0 H 0 Rn Rd
o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
integer element1;
integer element2;
bits(esize) product;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two
source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
bits(esize) product;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
product = (UInt(element1)*UInt(element2))<esize-1:0>;
if sub_op then
Elem[result, e, esize] = Elem[operand3, e, esize] - product;
else
Elem[result, e, esize] = Elem[operand3, e, esize] + product;
V[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply-Subtract from accumulator (vector, by element). This instruction multiplies the vector elements in the first
source SIMD&FP register by the specified value in the second source SIMD&FP register, and subtracts the results
from the vector elements of the destination SIMD&FP register. All the values in this instruction are unsigned integer
values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 1 0 0 H 0 Rn Rd
o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
integer element1;
integer element2;
bits(esize) product;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the
two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
bits(esize) product;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
product = (UInt(element1)*UInt(element2))<esize-1:0>;
if sub_op then
Elem[result, e, esize] = Elem[operand3, e, esize] - product;
else
Elem[result, e, esize] = Elem[operand3, e, esize] + product;
V[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move vector element to another vector element. This instruction copies the vector element of the source SIMD&FP
register to the specified vector element of the destination SIMD&FP register.
This instruction can insert data into individual elements within a SIMD&FP register without clearing the remaining
bits to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
• The encodings in this description are named to match the encodings of INS (element).
• The description of INS (element) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 1 1 0 0 0 0 imm5 0 imm4 1 Rn Rd
is equivalent to
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
imm5 <index1>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
imm5 <index2>
x0000 RESERVED
xxxx1 imm4<3:0>
xxx10 imm4<3:1>
xx100 imm4<3:2>
x1000 imm4<3>
Unspecified bits in "imm4" are ignored but should be set to zero by an assembler.
Operation
The description of INS (element) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move general-purpose register to a vector element. This instruction copies the contents of the source general-purpose
register to the specified vector element in the destination SIMD&FP register.
This instruction can insert data into individual elements within a SIMD&FP register without clearing the remaining
bits to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
• The encodings in this description are named to match the encodings of INS (general).
• The description of INS (general) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 0 imm5 0 0 0 1 1 1 Rn Rd
is equivalent to
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
imm5 <index>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>
<R> Is the width specifier for the general-purpose source register, encoded in “imm5”:
imm5 <R>
x0000 RESERVED
xxxx1 W
xxx10 W
xx100 W
x1000 X
<n> Is the number [0-30] of the general-purpose source register or ZR (31), encoded in the "Rn" field.
Operation
The description of INS (general) gives the operational pseudocode for this instruction.
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move vector element to scalar. This instruction duplicates the specified vector element in the SIMD&FP source
register into a scalar, and writes the result to the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
• The encodings in this description are named to match the encodings of DUP (element).
• The description of DUP (element) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 imm5 0 0 0 0 0 1 Rn Rd
is equivalent to
Assembler Symbols
imm5 <V>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
imm5 <T>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
imm5 <index>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>
Operation
The description of DUP (element) gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move vector element to general-purpose register. This instruction reads the unsigned integer from the source
SIMD&FP register, zero-extends it to form a 32-bit or 64-bit value, and writes the result to the destination general-
purpose register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
• The encodings in this description are named to match the encodings of UMOV.
• The description of UMOV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 x x x 0 0 0 0 1 1 1 1 Rn Rd
imm5
is equivalent to
is equivalent to
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<index> For the 32-bit variant: is the element index encoded in "imm5<4:3>".
For the 64-bit variant: is the element index encoded in "imm5<4>".
Operation
The description of UMOV gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move vector. This instruction copies the vector in the source SIMD&FP register into the destination SIMD&FP
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
• The encodings in this description are named to match the encodings of ORR (vector, register).
• The description of ORR (vector, register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 Rm 0 0 0 1 1 1 Rn Rd
size
is equivalent to
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 8B
1 16B
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Operation
The description of ORR (vector, register) gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move Immediate (vector). This instruction places an immediate constant into every vector element of the destination
SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q op 0 1 1 1 1 0 0 0 0 0 a b c cmode 0 1 d e f g h Rd
integer rd = UInt(Rd);
ImmediateOp operation;
case cmode:op of
when '0xx00' operation = ImmediateOp_MOVI;
when '0xx01' operation = ImmediateOp_MVNI;
when '0xx10' operation = ImmediateOp_ORR;
when '0xx11' operation = ImmediateOp_BIC;
when '10x00' operation = ImmediateOp_MOVI;
when '10x01' operation = ImmediateOp_MVNI;
when '10x10' operation = ImmediateOp_ORR;
when '10x11' operation = ImmediateOp_BIC;
when '110x0' operation = ImmediateOp_MOVI;
when '110x1' operation = ImmediateOp_MVNI;
when '1110x' operation = ImmediateOp_MOVI;
when '11110' operation = ImmediateOp_MOVI;
when '11111'
// FMOV Dn,#imm is in main FP instruction set
if Q == '0' then UNDEFINED;
operation = ImmediateOp_MOVI;
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<imm> Is a 64-bit immediate 'aaaaaaaabbbbbbbbccccccccddddddddeeeeeeeeffffffffgggggggghhhhhhhh',
encoded in "a:b:c:d:e:f:g:h".
Q <T>
0 8B
1 16B
Q <T>
0 4H
1 8H
Q <T>
0 2S
1 4S
<amount> For the 16-bit shifted immediate variant: is the shift amount encoded in “cmode<1>”:
cmode<1> <amount>
0 0
1 8
defaulting to 0 if LSL is omitted.
For the 32-bit shifted immediate variant: is the shift amount encoded in “cmode<2:1>”:
cmode<2:1> <amount>
00 0
01 8
10 16
11 24
defaulting to 0 if LSL is omitted.
For the 32-bit shifting ones variant: is the shift amount encoded in “cmode<0>”:
cmode<0> <amount>
0 8
1 16
CheckFPAdvSIMDEnabled64();
bits(datasize) operand;
bits(datasize) result;
case operation of
when ImmediateOp_MOVI
result = imm;
when ImmediateOp_MVNI
result = NOT(imm);
when ImmediateOp_ORR
operand = V[rd];
result = operand OR imm;
when ImmediateOp_BIC
operand = V[rd];
result = operand AND NOT(imm);
V[rd] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply (vector, by element). This instruction multiplies the vector elements in the first source SIMD&FP register by
the specified value in the second source SIMD&FP register, places the results in a vector, and writes the vector to the
destination SIMD&FP register. All the values in this instruction are unsigned integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 0 0 0 H 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
bits(esize) product;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP
registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 1 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if U == '1' && size != '00' then UNDEFINED;
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
bits(esize) product;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if poly then
product = PolynomialMult(element1, element2)<esize-1:0>;
else
product = (UInt(element1)*UInt(element2))<esize-1:0>;
Elem[result, e, esize] = product;
V[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the
inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
• The encodings in this description are named to match the encodings of NOT.
• The description of NOT gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 Rn Rd
is equivalent to
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 8B
1 16B
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
The description of NOT gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move inverted Immediate (vector). This instruction places the inverse of an immediate constant into every vector
element of the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 0 0 0 0 a b c cmode 0 1 d e f g h Rd
op
integer rd = UInt(Rd);
ImmediateOp operation;
case cmode:op of
when '0xx01' operation = ImmediateOp_MVNI;
when '0xx11' operation = ImmediateOp_BIC;
when '10x01' operation = ImmediateOp_MVNI;
when '10x11' operation = ImmediateOp_BIC;
when '110x1' operation = ImmediateOp_MVNI;
when '1110x' operation = ImmediateOp_MOVI;
when '11111'
// FMOV Dn,#imm is in main FP instruction set
if Q == '0' then UNDEFINED;
operation = ImmediateOp_MOVI;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
Q <T>
0 2S
1 4S
cmode<1> <amount>
0 0
1 8
defaulting to 0 if LSL is omitted.
For the 32-bit shifted immediate variant: is the shift amount encoded in “cmode<2:1>”:
cmode<2:1> <amount>
00 0
01 8
10 16
11 24
defaulting to 0 if LSL is omitted.
For the 32-bit shifting ones variant: is the shift amount encoded in “cmode<0>”:
cmode<0> <amount>
0 8
1 16
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand;
bits(datasize) result;
case operation of
when ImmediateOp_MOVI
result = imm;
when ImmediateOp_MVNI
result = NOT(imm);
when ImmediateOp_ORR
operand = V[rd];
result = operand OR imm;
when ImmediateOp_BIC
operand = V[rd];
result = operand AND NOT(imm);
V[rd] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value,
puts the result into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
if neg then
element = -element;
else
element = Abs(element);
Elem[result, e, esize] = element<esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the
inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias MVN.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize = 8;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV 8;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 8B
1 16B
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = NOT(element);
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP
registers, and writes the result to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 Rm 0 0 0 1 1 1 Rn Rd
size
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 8B
1 16B
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
operand2 = NOT(operand2);
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise inclusive OR (vector, immediate). This instruction reads each vector element from the destination SIMD&FP
register, performs a bitwise OR between each result and an immediate constant, places the result into a vector, and
writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 0 0 0 a b c x x x 1 0 1 d e f g h Rd
op cmode
integer rd = UInt(Rd);
ImmediateOp operation;
case cmode:op of
when '0xx00' operation = ImmediateOp_MOVI;
when '0xx10' operation = ImmediateOp_ORR;
when '10x00' operation = ImmediateOp_MOVI;
when '10x10' operation = ImmediateOp_ORR;
when '110x0' operation = ImmediateOp_MOVI;
when '1110x' operation = ImmediateOp_MOVI;
when '11110' operation = ImmediateOp_MOVI;
imm64 = AdvSIMDExpandImm(op, cmode, a:b:c:d:e:f:g:h);
imm = Replicate(imm64, datasize DIV 64);
Assembler Symbols
<Vd> Is the name of the SIMD&FP register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
Q <T>
0 2S
1 4S
<amount> For the 16-bit variant: is the shift amount encoded in “cmode<1>”:
cmode<1> <amount>
0 0
1 8
defaulting to 0 if LSL is omitted.
cmode<2:1> <amount>
00 0
01 8
10 16
11 24
defaulting to 0 if LSL is omitted.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand;
bits(datasize) result;
case operation of
when ImmediateOp_MOVI
result = imm;
when ImmediateOp_MVNI
result = NOT(imm);
when ImmediateOp_ORR
operand = V[rd];
result = operand OR imm;
when ImmediateOp_BIC
operand = V[rd];
result = operand AND NOT(imm);
V[rd] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP
registers, and writes the result to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias MOV (vector).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 Rm 0 0 0 1 1 1 Rn Rd
size
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 8B
1 16B
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Alias Conditions
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Polynomial Multiply. This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP
registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
For information about multiplying polynomials see Polynomial arithmetic over {0, 1}.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 1 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if U == '1' && size != '00' then UNDEFINED;
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 x RESERVED
1x x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
bits(esize) product;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if poly then
product = PolynomialMult(element1, element2)<esize-1:0>;
else
product = (UInt(element1)*UInt(element2))<esize-1:0>;
Elem[result, e, esize] = product;
V[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors
of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP
register. The destination vector elements are twice as long as the elements that are multiplied.
For information about multiplying polynomials see Polynomial arithmetic over {0, 1}.
The PMULL instruction extracts each source vector from the lower half of each source register. The PMULL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 1 1 0 0 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 RESERVED
10 RESERVED
11 1Q
The '1Q' arrangement is only allocated in an implementation that includes the Cryptographic Extension,
and is otherwise RESERVED.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 x RESERVED
10 x RESERVED
11 0 1D
11 1 2D
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, 2*esize] = PolynomialMult(element1, element2);
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register
to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the
result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
The results are rounded. For truncated results, see ADDHN.
The RADDHN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the RADDHN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 0 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand1 = V[n];
bits(2*datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = if round then 1 << (esize - 1) else 0;
bits(2*esize) element1;
bits(2*esize) element2;
bits(2*esize) sum;
for e = 0 to elements-1
element1 = Elem[operand1, e, 2*esize];
element2 = Elem[operand2, e, 2*esize];
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
sum = sum + round_const;
Elem[result, e, esize] = sum<2*esize-1:esize>;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Rotate and Exclusive OR rotates each 64-bit element of the 128-bit vector in a source SIMD&FP register left by 1,
performs a bitwise exclusive OR of the resulting 128-bit vector and the vector in another source SIMD&FP register,
and writes the result to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA3 is implemented.
Advanced SIMD
(FEAT_SHA3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 0 0 0 1 1 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vm = V[m];
bits(128) Vn = V[n];
V[d] = Vn EOR (ROL(Vm<127:64>, 1):ROL(Vm<63:0>, 1));
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the
bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize = 8;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV 8;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 8B
1 16B
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
bits(esize) rev;
for e = 0 to elements-1
element = Elem[operand, e, esize];
for i = 0 to esize-1
rev<(esize-1)-i> = element<i>;
Elem[result, e, esize] = rev;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of
the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination
SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 0 0 1 1 0 Rn Rd
U o0
integer d = UInt(Rd);
integer n = UInt(Rn);
// => op+size:
// 64+B = 0, 64+H = 1, 64+S = 2, 64+D = X
// 32+B = 1, 32+H = 2, 32+S = X, 32+D = X
// 16+B = 2, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X
// => 3-(op+size) (index bits in group)
// 64/B = 3, 64+H = 2, 64+S = 1, 64+D = X
// 32+B = 2, 32+H = 1, 32+S = X, 32+D = X
// 16+B = 1, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X
integer container_size;
case op of
when '10' container_size = 16;
when '01' container_size = 32;
when '00' container_size = 64;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 x RESERVED
1x x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element = 0;
integer rev_element;
for c = 0 to containers-1
rev_element = element + elements_per_container - 1;
for e = 0 to elements_per_container-1
Elem[result, rev_element, esize] = Elem[operand, element, esize];
element = element + 1;
rev_element = rev_element - 1;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word
of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination
SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 0 0 0 1 0 Rn Rd
U o0
integer d = UInt(Rd);
integer n = UInt(Rn);
// => op+size:
// 64+B = 0, 64+H = 1, 64+S = 2, 64+D = X
// 32+B = 1, 32+H = 2, 32+S = X, 32+D = X
// 16+B = 2, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X
// => 3-(op+size) (index bits in group)
// 64/B = 3, 64+H = 2, 64+S = 1, 64+D = X
// 32+B = 2, 32+H = 1, 32+S = X, 32+D = X
// 16+B = 1, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X
integer container_size;
case op of
when '10' container_size = 16;
when '01' container_size = 32;
when '00' container_size = 64;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
1x x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element = 0;
integer rev_element;
for c = 0 to containers-1
rev_element = element + elements_per_container - 1;
for e = 0 to elements_per_container-1
Elem[result, rev_element, esize] = Elem[operand, element, esize];
element = element + 1;
rev_element = rev_element - 1;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements
in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the
vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 0 0 0 1 0 Rn Rd
U o0
integer d = UInt(Rd);
integer n = UInt(Rn);
// => op+size:
// 64+B = 0, 64+H = 1, 64+S = 2, 64+D = X
// 32+B = 1, 32+H = 2, 32+S = X, 32+D = X
// 16+B = 2, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X
// => 3-(op+size) (index bits in group)
// 64/B = 3, 64+H = 2, 64+S = 1, 64+D = X
// 32+B = 2, 32+H = 1, 32+S = X, 32+D = X
// 16+B = 1, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X
integer container_size;
case op of
when '10' container_size = 16;
when '01' container_size = 32;
when '00' container_size = 64;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element = 0;
integer rev_element;
for c = 0 to containers-1
rev_element = element + elements_per_container - 1;
for e = 0 to elements_per_container-1
Elem[result, rev_element, esize] = Elem[operand, element, esize];
element = element + 1;
rev_element = rev_element - 1;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Rounding Shift Right Narrow (immediate). This instruction reads each unsigned integer value from the vector in the
source SIMD&FP register, right shifts each result by an immediate value, writes the final result to a vector, and writes
the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as
long as the source vector elements. The results are rounded. For truncated results, see SHRN.
The RSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
RSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 0 0 1 1 Rn Rd
immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> Is the right shift amount, in the range 1 to the destination element width in bits, encoded in
“immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
for e = 0 to elements-1
element = (UInt(Elem[operand, e, 2*esize]) + round_const) >> shift;
Elem[result, e, esize] = element<esize-1:0>;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source
SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most
significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP
register.
The results are rounded. For truncated results, see SUBHN.
The RSUBHN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the RSUBHN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 0 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand1 = V[n];
bits(2*datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = if round then 1 << (esize - 1) else 0;
bits(2*esize) element1;
bits(2*esize) element2;
bits(2*esize) sum;
for e = 0 to elements-1
element1 = Elem[operand1, e, 2*esize];
element2 = Elem[operand2, e, 2*esize];
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
sum = sum + round_const;
Elem[result, e, esize] = sum<2*esize-1:esize>;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source
SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the
absolute values of the results into the elements of the vector of the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 1 1 1 Rn Rd
U ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
bits(esize) absdiff;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper
half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP
register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP
register. The destination vector elements are twice as long as the source vector elements.
The SABAL instruction extracts each source vector from the lower half of each source register. The SABAL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 1 0 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) absdiff;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP
register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the
results into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 1 0 1 Rn Rd
U ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
bits(esize) absdiff;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP
register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the
results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The
destination vector elements are twice as long as the source vector elements.
The SABDL instruction extracts each source vector from the lower half of each source register, while the SABDL2
instruction extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 1 0 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) absdiff;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the
vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 1 1 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <Ta>
00 0 4H
00 1 8H
01 0 2S
01 1 4S
10 0 1D
10 1 2D
11 x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(2*esize) sum;
integer op1;
integer op2;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source
SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results
into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as
long as the source vector elements. All the values in this instruction are signed integer values.
The SADDL instruction extracts each source vector from the lower half of each source register. The SADDL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 0 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
Elem[result, e, 2*esize] = sum<2*esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source
SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The
destination vector elements are twice as long as the source vector elements.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 0 1 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <Ta>
00 0 4H
00 1 8H
01 0 2S
01 1 4S
10 0 1D
10 1 2D
11 x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(2*esize) sum;
integer op1;
integer op2;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together,
and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source
vector elements. All the values in this instruction are signed integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 1 0 0 0 0 0 0 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
00 H
01 S
10 D
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
integer sum;
V[d] = sum<2*esize-1:0>;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding
vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and
writes the vector to the SIMD&FP destination register.
The SADDW instruction extracts the second source vector from the lower half of the second source register. The SADDW2
instruction extracts the second source vector from the upper half of the second source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 1 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand1 = V[n];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, 2*esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
Elem[result, e, 2*esize] = sum<2*esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed fixed-point Convert to Floating-point (scalar). This instruction converts the signed value in the 32-bit or 64-bit
general-purpose source register to a floating-point value using the rounding mode that is specified by the FPCR, and
writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 0 0 0 0 1 0 scale Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00' fltsize = 32;
when '01' fltsize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
rounding = FPRoundingMode(FPCR[]);
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<fbits> For the 32-bit to double-precision, 32-bit to half-precision and 32-bit to single-precision variant: is the
number of bits after the binary point in the fixed-point source, in the range 1 to 32, encoded as 64
minus "scale".
For the 64-bit to double-precision, 64-bit to half-precision and 64-bit to single-precision variant: is the
number of bits after the binary point in the fixed-point source, in the range 1 to 64, encoded as 64
minus "scale".
Operation
CheckFPAdvSIMDEnabled64();
intval = X[n];
fltval = if merge then V[d] else Zeros();
Elem[fltval, 0, fltsize] = FixedToFP(intval, fracbits, FALSE, fpcr, rounding);
V[d] = fltval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed integer Convert to Floating-point (scalar). This instruction converts the signed integer value in the general-
purpose source register to a floating-point value using the rounding mode that is specified by the FPCR, and writes the
result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 0 1 0 0 0 0 0 0 0 Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
rounding = FPRoundingMode(FPCR[]);
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
intval = X[n];
fltval = if merge then V[d] else Zeros();
Elem[fltval, 0, fltsize] = FixedToFP(intval, 0, FALSE, fpcr, rounding);
V[d] = fltval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-
point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP
destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 1 1 1 0 0 1 Rn Rd
U immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 1 1 0 0 1 Rn Rd
U immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 x RESERVED
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width, encoded in
“immh:immb”:
immh <fbits>
000x RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
For the vector variant: is the number of fractional bits, in the range 1 to the element width, encoded in
“immh:immb”:
immh <fbits>
0000 SEE Advanced SIMD modified immediate
0001 RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FixedToFP(element, fracbits, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed integer Convert to Floating-point (vector). This instruction converts each element in a vector from signed
integer to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP
destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Dot Product signed arithmetic (vector, by element). This instruction performs the dot product of the four 8-bit
elements in each 32-bit element of the first source register with the four 8-bit elements of an indexed 32-bit element in
the second source register, accumulating the result into the corresponding 32-bit element of the destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.
Note
Vector
(FEAT_DotProd)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 1 1 0 H 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(M:Rm);
integer index = UInt(H:L);
Assembler Symbols
<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.
Q <Ta>
0 2S
1 4S
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Q <Tb>
0 8B
1 16B
<Vm> Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.
<index> Is the element index, encoded in the "H:L" fields.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(128) operand2 = V[m];
bits(datasize) result = V[d];
for e = 0 to elements-1
integer res = 0;
integer element1, element2;
for i = 0 to 3
if signed then
element1 = SInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = SInt(Elem[operand2, 4*index+i, esize DIV 4]);
else
element1 = UInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = UInt(Elem[operand2, 4*index+i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = Elem[result, e, esize] + res;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Dot Product signed arithmetic (vector). This instruction performs the dot product of the four signed 8-bit elements in
each 32-bit element of the first source register with the four signed 8-bit elements of the corresponding 32-bit element
in the second source register, accumulating the result into the corresponding 32-bit element of the destination
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.
Note
Vector
(FEAT_DotProd)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 1 0 0 1 0 1 Rn Rd
U
Assembler Symbols
<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.
Q <Ta>
0 2S
1 4S
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Q <Tb>
0 8B
1 16B
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
result = V[d];
for e = 0 to elements-1
integer res = 0;
integer element1, element2;
for i = 0 to 3
if signed then
element1 = SInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = SInt(Elem[operand2, 4*e+i, esize DIV 4]);
else
element1 = UInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = UInt(Elem[operand2, 4*e+i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = Elem[result, e, esize] + res;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA1Ext() then UNDEFINED;
Assembler Symbols
<Qd> Is the 128-bit name of the SIMD&FP source and destination, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) X = V[d];
bits(32) Y = V[n]; // Note: 32 not 128 bits wide
bits(128) W = V[m];
bits(32) t;
for e = 0 to 3
t = SHAchoose(X<63:32>, X<95:64>, X<127:96>);
Y = Y + ROL(X<31:0>, 5) + t + Elem[W, e, 32];
X<63:32> = ROL(X<63:32>, 30);
<Y, X> = ROL(Y:X, 32);
V[d] = X;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveSHA1Ext() then UNDEFINED;
Assembler Symbols
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA1Ext() then UNDEFINED;
Assembler Symbols
<Qd> Is the 128-bit name of the SIMD&FP source and destination, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) X = V[d];
bits(32) Y = V[n]; // Note: 32 not 128 bits wide
bits(128) W = V[m];
bits(32) t;
for e = 0 to 3
t = SHAmajority(X<63:32>, X<95:64>, X<127:96>);
Y = Y + ROL(X<31:0>, 5) + t + Elem[W, e, 32];
X<63:32> = ROL(X<63:32>, 30);
<Y, X> = ROL(Y:X, 32);
V[d] = X;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA1Ext() then UNDEFINED;
Assembler Symbols
<Qd> Is the 128-bit name of the SIMD&FP source and destination, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) X = V[d];
bits(32) Y = V[n]; // Note: 32 not 128 bits wide
bits(128) W = V[m];
bits(32) t;
for e = 0 to 3
t = SHAparity(X<63:32>, X<95:64>, X<127:96>);
Y = Y + ROL(X<31:0>, 5) + t + Elem[W, e, 32];
X<63:32> = ROL(X<63:32>, 30);
<Y, X> = ROL(Y:X, 32);
V[d] = X;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA1Ext() then UNDEFINED;
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
result = operand2<63:0>:operand1<127:64>;
result = result EOR operand1 EOR operand3;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveSHA1Ext() then UNDEFINED;
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA256Ext() then UNDEFINED;
Assembler Symbols
<Qd> Is the 128-bit name of the SIMD&FP source and destination, encoded in the "Rd" field.
<Qn> Is the 128-bit name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) result;
result = SHA256hash(V[d], V[n], V[m], TRUE);
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA256Ext() then UNDEFINED;
Assembler Symbols
<Qd> Is the 128-bit name of the SIMD&FP source and destination, encoded in the "Rd" field.
<Qn> Is the 128-bit name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) result;
result = SHA256hash(V[n], V[d], V[m], FALSE);
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveSHA256Ext() then UNDEFINED;
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
for e = 0 to 3
elt = Elem[T, e, 32];
elt = ROR(elt, 7) EOR ROR(elt, 18) EOR LSR(elt, 3);
Elem[result, e, 32] = elt + Elem[operand1, e, 32];
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA256Ext() then UNDEFINED;
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
T1 = operand3<127:64>;
for e = 0 to 1
elt = Elem[T1, e, 32];
elt = ROR(elt, 17) EOR ROR(elt, 19) EOR LSR(elt, 10);
elt = elt + Elem[operand1, e, 32] + Elem[T0, e, 32];
Elem[result, e, 32] = elt;
T1 = result<63:0>;
for e = 2 to 3
elt = Elem[T1, e-2, 32];
elt = ROR(elt, 17) EOR ROR(elt, 19) EOR LSR(elt, 10);
elt = elt + Elem[operand1, e, 32] + Elem[T0, e, 32];
Elem[result, e, 32] = elt;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SHA512 Hash update part 1 takes the values from the three 128-bit source SIMD&FP registers and produces a 128-bit
output value that combines the sigma1 and chi functions of two iterations of the SHA512 computation. It returns this
value to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA512 is implemented.
Advanced SIMD
(FEAT_SHA512)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 0 0 0 0 0 Rn Rd
Assembler Symbols
<Qd> Is the 128-bit name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Qn> Is the 128-bit name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vtmp;
bits(64) MSigma1;
bits(64) tmp;
bits(128) X = V[n];
bits(128) Y = V[m];
bits(128) W = V[d];
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SHA512 Hash update part 2 takes the values from the three 128-bit source SIMD&FP registers and produces a 128-bit
output value that combines the sigma0 and majority functions of two iterations of the SHA512 computation. It returns
this value to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA512 is implemented.
Advanced SIMD
(FEAT_SHA512)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 0 0 0 0 1 Rn Rd
Assembler Symbols
<Qd> Is the 128-bit name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Qn> Is the 128-bit name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vtmp;
bits(64) NSigma0;
bits(128) X = V[n];
bits(128) Y = V[m];
bits(128) W = V[d];
V[d] = Vtmp;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SHA512 Schedule Update 0 takes the values from the two 128-bit source SIMD&FP registers and produces a 128-bit
output value that combines the gamma0 functions of two iterations of the SHA512 schedule update that are performed
after the first 16 iterations within a block. It returns this value to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA512 is implemented.
Advanced SIMD
(FEAT_SHA512)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(64) sig0;
bits(128) Vtmp;
bits(128) X = V[n];
bits(128) W = V[d];
sig0 = ROR(W<127:64>, 1) EOR ROR(W<127:64>, 8) EOR ('0000000':W<127:71>);
Vtmp<63:0> = W<63:0> + sig0;
sig0 = ROR(X<63:0>, 1) EOR ROR(X<63:0>, 8) EOR ('0000000':X<63:7>);
Vtmp<127:64> = W<127:64> + sig0;
V[d] = Vtmp;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SHA512 Schedule Update 1 takes the values from the three source SIMD&FP registers and produces a 128-bit output
value that combines the gamma1 functions of two iterations of the SHA512 schedule update that are performed after
the first 16 iterations within a block. It returns this value to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA512 is implemented.
Advanced SIMD
(FEAT_SHA512)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 0 0 0 1 0 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(64) sig1;
bits(128) Vtmp;
bits(128) X = V[n];
bits(128) Y = V[m];
bits(128) W = V[d];
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP
registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination
SIMD&FP register.
The results are truncated. For rounded results, see SRHADD.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 0 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer sum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
sum = element1 + element2;
Elem[result, e, esize] = sum<esize:1>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift Left (immediate). This instruction reads each value from a vector, left shifts each result by an immediate value,
writes the final result to a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 1 0 1 0 1 Rn Rd
immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 1 0 1 0 1 Rn Rd
immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
immh <V>
0xxx RESERVED
1xxx D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the left shift amount, in the range 0 to 63, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
1xxx (UInt(immh:immb)-64)
For the vector variant: is the left shift amount, in the range 0 to the element width in bits minus 1,
encoded in “immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
for e = 0 to elements-1
Elem[result, e, esize] = LSL(Elem[operand, e, esize], shift);
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift Left Long (by element size). This instruction reads each vector element in the lower or upper half of the source
SIMD&FP register, left shifts each result by the element size, writes the final result to a vector, and writes the vector
to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
The SHLL instruction extracts vector elements from the lower half of the source register. The SHLL2 instruction
extracts vector elements from the upper half of the source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 1 0 0 1 1 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<shift> Is the left shift amount, which must be equal to the source element width in bits, encoded in “size”:
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = Vpart[n, part];
bits(2*datasize) result;
integer element;
for e = 0 to elements-1
element = Int(Elem[operand, e, esize], unsigned) << shift;
Elem[result, e, 2*esize] = element<2*esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift Right Narrow (immediate). This instruction reads each unsigned integer value from the source SIMD&FP
register, right shifts each result by an immediate value, puts the final result into a vector, and writes the vector to the
lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the
source vector elements. The results are truncated. For rounded results, see RSHRN.
The RSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
RSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 0 0 0 1 Rn Rd
immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> Is the right shift amount, in the range 1 to the destination element width in bits, encoded in
“immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
for e = 0 to elements-1
element = (UInt(Elem[operand, e, 2*esize]) + round_const) >> shift;
Elem[result, e, esize] = element<esize-1:0>;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register
from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit,
places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 0 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer diff;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
diff = element1 - element2;
Elem[result, e, esize] = diff<esize:1>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift Left and Insert (immediate). This instruction reads each vector element in the source SIMD&FP register, left
shifts each vector element by an immediate value, and inserts the result into the corresponding vector element in the
destination SIMD&FP register such that the new zero bits created by the shift are not inserted but retain their existing
value. Bits shifted out of the left of each vector element in the source register are lost.
The following figure shows an example of the operation of shift left by 3 for an 8-bit vector element.
63 56 55 0
Vn.B[7]
63 56 55 0
Vd.B[7] after operation
63 56 55 0
Vd.B[7] before operation
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 1 0 1 0 1 Rn Rd
immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 1 0 1 0 1 Rn Rd
immh
integer d = UInt(Rd);
integer n = UInt(Rn);
immh <V>
0xxx RESERVED
1xxx D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the left shift amount, in the range 0 to 63, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
1xxx (UInt(immh:immb)-64)
For the vector variant: is the left shift amount, in the range 0 to the element width in bits minus 1,
encoded in “immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2 = V[d];
bits(datasize) result;
bits(esize) mask = LSL(Ones(esize), shift);
bits(esize) shifted;
for e = 0 to elements-1
shifted = LSL(Elem[operand, e, esize], shift);
Elem[result, e, esize] = (Elem[operand2, e, esize] AND NOT(mask)) OR shifted;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SM3PARTW1 takes three 128-bit vectors from the three source SIMD&FP registers and returns a 128-bit result in the
destination SIMD&FP register. The result is obtained by a three-way exclusive OR of the elements within the input
vectors with some fixed rotations, see the Operation pseudocode for more information.
This instruction is implemented only when FEAT_SM3 is implemented.
Advanced SIMD
(FEAT_SM3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 1 0 0 0 0 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Vd = V[d];
bits(128) result;
for i = 0 to 3
if i == 3 then
result<127:96> = (Vd EOR Vn)<127:96> EOR (ROL(result<31:0>, 15));
result<(32*i)+31:(32*i)> = result<(32*i)+31:(32*i)> EOR ROL(result<(32*i)+31:(32*i)>, 15) EOR ROL(res
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SM3PARTW2 takes three 128-bit vectors from three source SIMD&FP registers and returns a 128-bit result in the
destination SIMD&FP register. The result is obtained by a three-way exclusive OR of the elements within the input
vectors with some fixed rotations, see the Operation pseudocode for more information.
This instruction is implemented only when FEAT_SM3 is implemented.
Advanced SIMD
(FEAT_SM3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 1 0 0 0 1 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Vd = V[d];
bits(128) result;
bits(128) tmp;
bits(32) tmp2;
tmp<127:0> = Vn EOR (ROL(Vm<127:96>, 7):ROL(Vm<95:64>, 7):ROL(Vm<63:32>, 7):ROL(Vm<31:0>, 7));
result<127:0> = Vd<127:0> EOR tmp<127:0>;
tmp2 = ROL(tmp<31:0>, 15);
tmp2 = tmp2 EOR ROL(tmp2, 15) EOR ROL(tmp2, 23);
result<127:96> = result<127:96> EOR tmp2;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SM3SS1 rotates the top 32 bits of the 128-bit vector in the first source SIMD&FP register by 12, and adds that 32-bit
value to the two other 32-bit values held in the top 32 bits of each of the 128-bit vectors in the second and third source
SIMD&FP registers, rotating this result left by 7 and writing the final result into the top 32 bits of the vector in the
destination SIMD&FP register, with the bottom 96 bits of the vector being written to 0.
This instruction is implemented only when FEAT_SM3 is implemented.
Advanced SIMD
(FEAT_SM3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 0 Ra Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
<Va> Is the name of the third SIMD&FP source register, encoded in the "Ra" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Va = V[a];
bits(128) result;
result<127:96> = ROL((ROL(Vn<127:96>, 12) + Vm<127:96> + Va<127:96>), 7);
result<95:0> = Zeros();
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SM3TT1A takes three 128-bit vectors from three source SIMD&FP registers and a 2-bit immediate index value, and
returns a 128-bit result in the destination SIMD&FP register. It performs a three-way exclusive OR of the three 32-bit
fields held in the upper three elements of the first source vector, and adds the resulting 32-bit value and the following
three other 32-bit values:
• The bottom 32-bit element of the first source vector, Vd, that was used for the three-way exclusive OR.
• The result of the exclusive OR of the top 32-bit element of the second source vector, Vn, with a rotation left by
12 of the top 32-bit element of the first source vector.
• A 32-bit element indexed out of the third source vector, Vm.
The result of this addition is returned as the top element of the result. The other elements of the result are taken from
elements of the first source vector, with the element returned in bits<63:32> being rotated left by 9.
This instruction is implemented only when FEAT_SM3 is implemented.
Advanced SIMD
(FEAT_SM3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 1 0 imm2 0 0 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
<imm2> Is a 32-bit element indexed out of <Vm>, encoded in "imm2".
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Vd = V[d];
bits(32) WjPrime;
bits(128) result;
bits(32) TT1;
bits(32) SS2;
Operational information
If PSTATE.DIT is 1:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SM3TT1B takes three 128-bit vectors from three source SIMD&FP registers and a 2-bit immediate index value, and
returns a 128-bit result in the destination SIMD&FP register. It performs a 32-bit majority function between the three
32-bit fields held in the upper three elements of the first source vector, and adds the resulting 32-bit value and the
following three other 32-bit values:
• The bottom 32-bit element of the first source vector, Vd, that was used for the 32-bit majority function.
• The result of the exclusive OR of the top 32-bit element of the second source vector, Vn, with a rotation left by
12 of the top 32-bit element of the first source vector.
• A 32-bit element indexed out of the third source vector, Vm.
The result of this addition is returned as the top element of the result. The other elements of the result are taken from
elements of the first source vector, with the element returned in bits<63:32> being rotated left by 9.
This instruction is implemented only when FEAT_SM3 is implemented.
Advanced SIMD
(FEAT_SM3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 1 0 imm2 0 1 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
<imm2> Is a 32-bit element indexed out of <Vm>, encoded in "imm2".
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Vd = V[d];
bits(32) WjPrime;
bits(128) result;
bits(32) TT1;
bits(32) SS2;
Operational information
If PSTATE.DIT is 1:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SM3TT2A takes three 128-bit vectors from three source SIMD&FP register and a 2-bit immediate index value, and
returns a 128-bit result in the destination SIMD&FP register. It performs a three-way exclusive OR of the three 32-bit
fields held in the upper three elements of the first source vector, and adds the resulting 32-bit value and the following
three other 32-bit values:
• The bottom 32-bit element of the first source vector, Vd, that was used for the three-way exclusive OR.
• The 32-bit element held in the top 32 bits of the second source vector, Vn.
• A 32-bit element indexed out of the third source vector, Vm.
A three-way exclusive OR is performed of the result of this addition, the result of the addition rotated left by 9, and the
result of the addition rotated left by 17. The result of this exclusive OR is returned as the top element of the returned
result. The other elements of this result are taken from elements of the first source vector, with the element returned
in bits<63:32> being rotated left by 19.
This instruction is implemented only when FEAT_SM3 is implemented.
Advanced SIMD
(FEAT_SM3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 1 0 imm2 1 0 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
<imm2> Is a 32-bit element indexed out of <Vm>, encoded in "imm2".
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Vd = V[d];
bits(32) Wj;
bits(128) result;
bits(32) TT2;
Wj = Elem[Vm, i, 32];
TT2 = Vd<63:32> EOR (Vd<127:96> EOR Vd<95:64>);
TT2 = (TT2+Vd<31:0>+Vn<127:96>+Wj)<31:0>;
result<31:0> = Vd<63:32>;
result<63:32> = ROL(Vd<95:64>, 19);
result<95:64> = Vd<127:96>;
result<127:96> = TT2 EOR ROL(TT2, 9) EOR ROL(TT2, 17);
V[d] = result;
Operational information
If PSTATE.DIT is 1:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SM3TT2B takes three 128-bit vectors from three source SIMD&FP registers, and a 2-bit immediate index value, and
returns a 128-bit result in the destination SIMD&FP register. It performs a 32-bit majority function between the three
32-bit fields held in the upper three elements of the first source vector, and adds the resulting 32-bit value and the
following three other 32-bit values:
• The bottom 32-bit element of the first source vector, Vd, that was used for the 32-bit majority function.
• The 32-bit element held in the top 32 bits of the second source vector, Vn.
• A 32-bit element indexed out of the third source vector, Vm.
A three-way exclusive OR is performed of the result of this addition, the result of the addition rotated left by 9, and the
result of the addition rotated left by 17. The result of this exclusive OR is returned as the top element of the returned
result. The other elements of this result are taken from elements of the first source vector, with the element returned
in bits<63:32> being rotated left by 19.
This instruction is implemented only when FEAT_SM3 is implemented.
Advanced SIMD
(FEAT_SM3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 1 0 imm2 1 1 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.
<imm2> Is a 32-bit element indexed out of <Vm>, encoded in "imm2".
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Vd = V[d];
bits(32) Wj;
bits(128) result;
bits(32) TT2;
Wj = Elem[Vm, i, 32];
TT2 = (Vd<127:96> AND Vd<95:64>) OR (NOT(Vd<127:96>) AND Vd<63:32>);
TT2 = (TT2+Vd<31:0>+Vn<127:96>+Wj)<31:0>;
result<31:0> = Vd<63:32>;
result<63:32> = ROL(Vd<95:64>, 19);
result<95:64> = Vd<127:96>;
result<127:96> = TT2 EOR ROL(TT2, 9) EOR ROL(TT2, 17);
V[d] = result;
Operational information
If PSTATE.DIT is 1:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SM4 Encode takes input data as a 128-bit vector from the first source SIMD&FP register, and four iterations of the
round key held as the elements of the 128-bit vector in the second source SIMD&FP register. It encrypts the data by
four rounds, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SM4 is implemented.
Advanced SIMD
(FEAT_SM4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vn = V[n];
bits(32) intval;
bits(128) roundresult;
bits(32) roundkey;
roundresult = V[d];
for index = 0 to 3
roundkey = Elem[Vn, index, 32];
for i = 0 to 3
Elem[intval, i, 8] = Sbox(Elem[intval, i, 8]);
intval = intval EOR ROL(intval, 2) EOR ROL(intval, 10) EOR ROL(intval, 18) EOR ROL(intval, 24);
intval = intval EOR roundresult<31:0>;
roundresult<31:0> = roundresult<63:32>;
roundresult<63:32> = roundresult<95:64>;
roundresult<95:64> = roundresult<127:96>;
roundresult<127:96> = intval;
V[d] = roundresult;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SM4 Key takes an input as a 128-bit vector from the first source SIMD&FP register and a 128-bit constant from the
second SIMD&FP register. It derives four iterations of the output key, in accordance with the SM4 standard, returning
the 128-bit result to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SM4 is implemented.
Advanced SIMD
(FEAT_SM4)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 1 0 0 1 0 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vm = V[m];
bits(32) intval;
bits(128) result;
bits(32) const;
bits(128) roundresult;
roundresult = V[n];
for index = 0 to 3
const = Elem[Vm, index, 32];
for i = 0 to 3
Elem[intval, i, 8] = Sbox(Elem[intval, i, 8]);
roundresult<31:0> = roundresult<63:32>;
roundresult<63:32> = roundresult<95:64>;
roundresult<95:64> = roundresult<127:96>;
roundresult<127:96> = intval;
V[d] = roundresult;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source
SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the
destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 0 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer maxmin;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
maxmin = if minimum then Min(element1, element2) else Max(element1, element2);
Elem[result, e, esize] = maxmin<esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source
SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent
vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a
vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 0 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
integer element1;
integer element2;
integer maxmin;
for e = 0 to elements-1
element1 = Int(Elem[concat, 2*e, esize], unsigned);
element2 = Int(Elem[concat, (2*e)+1, esize], unsigned);
maxmin = if minimum then Min(element1, element2) else Max(element1, element2);
Elem[result, e, esize] = maxmin<esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register,
and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction
are signed integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 1 0 0 0 0 1 0 1 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
00 B
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
integer maxmin;
integer element;
V[d] = maxmin<esize-1:0>;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source
SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to
the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 0 1 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer maxmin;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
maxmin = if minimum then Min(element1, element2) else Max(element1, element2);
Elem[result, e, esize] = maxmin<esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source
SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent
vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a
vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 0 1 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
integer element1;
integer element2;
integer maxmin;
for e = 0 to elements-1
element1 = Int(Elem[concat, 2*e, esize], unsigned);
element2 = Int(Elem[concat, (2*e)+1, esize], unsigned);
maxmin = if minimum then Min(element1, element2) else Max(element1, element2);
Elem[result, e, esize] = maxmin<esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register,
and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction
are signed integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 1 0 0 0 1 1 0 1 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
00 B
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
integer maxmin;
integer element;
V[d] = maxmin<esize-1:0>;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Multiply-Add Long (vector, by element). This instruction multiplies each vector element in the lower or upper
half of the first source SIMD&FP register by the specified vector element in the second source SIMD&FP register, and
accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector
elements are twice as long as the elements that are multiplied. All the values in this instruction are signed integer
values.
The SMLAL instruction extracts vector elements from the lower half of the first source register. The SMLAL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 0 0 1 0 H 0 Rn Rd
U o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(idxdsize) operand2 = V[m];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or
upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements
of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are
multiplied.
The SMLAL instruction extracts each source vector from the lower half of each source register. The SMLAL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 0 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
bits(2*esize) accum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
product = (element1*element2)<2*esize-1:0>;
if sub_op then
accum = Elem[operand3, e, 2*esize] - product;
else
accum = Elem[operand3, e, 2*esize] + product;
Elem[result, e, 2*esize] = accum;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Multiply-Subtract Long (vector, by element). This instruction multiplies each vector element in the lower or
upper half of the first source SIMD&FP register by the specified vector element of the second source SIMD&FP
register and subtracts the results from the vector elements of the destination SIMD&FP register. The destination
vector elements are twice as long as the elements that are multiplied.
The SMLSL instruction extracts vector elements from the lower half of the first source register. The SMLSL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 0 1 1 0 H 0 Rn Rd
U o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(idxdsize) operand2 = V[m];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or
upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of
the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are
multiplied.
The SMLSL instruction extracts each source vector from the lower half of each source register. The SMLSL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 0 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
bits(2*esize) accum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
product = (element1*element2)<2*esize-1:0>;
if sub_op then
accum = Elem[operand3, e, 2*esize] - product;
else
accum = Elem[operand3, e, 2*esize] + product;
Elem[result, e, 2*esize] = accum;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of signed 8-bit integer
values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source vector. The
resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the
destination vector. This is equivalent to performing an 8-way dot product per destination element.
From Armv8.2 to Armv8.5, this is an OPTIONAL instruction. From Armv8.6 it is mandatory for implementations that
include Advanced SIMD to support it. ID_AA64ISAR1_EL1.I8MM indicates whether this instruction is supported.
Vector
(FEAT_I8MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 1 0 0 Rm 1 0 1 0 0 1 Rn Rd
U B
Assembler Symbols
<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(128) operand1 = V[n];
bits(128) operand2 = V[m];
bits(128) addend = V[d];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Move vector element to general-purpose register. This instruction reads the signed integer from the source
SIMD&FP register, sign-extends it to form a 32-bit or 64-bit value, and writes the result to destination general-purpose
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 imm5 0 0 1 0 1 1 Rn Rd
32-bit (Q == 0)
64-bit (Q == 1)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer size;
case Q:imm5 of
when 'xxxxx1' size = 0; // SMOV [WX]d, Vn.B
when 'xxxx10' size = 1; // SMOV [WX]d, Vn.H
when '1xx100' size = 2; // SMOV Xd, Vn.S
otherwise UNDEFINED;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<Ts> For the 32-bit variant: is an element size specifier, encoded in “imm5”:
imm5 <Ts>
xxx00 RESERVED
xxxx1 B
xxx10 H
imm5 <Ts>
xx000 RESERVED
xxxx1 B
xxx10 H
xx100 S
<index> For the 32-bit variant: is the element index encoded in “imm5”:
imm5 <index>
xxx00 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
imm5 <index>
xx000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
Operation
CheckFPAdvSIMDEnabled64();
bits(idxdsize) operand = V[n];
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Multiply Long (vector, by element). This instruction multiplies each vector element in the lower or upper half of
the first source SIMD&FP register by the specified vector element of the second source SIMD&FP register, places the
result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are
twice as long as the elements that are multiplied.
The SMULL instruction extracts vector elements from the lower half of the first source register. The SMULL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 0 1 0 H 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(idxdsize) operand2 = V[m];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper
half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the
destination SIMD&FP register.
The destination vector elements are twice as long as the elements that are multiplied.
The SMULL instruction extracts each source vector from the lower half of each source register. The SMULL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 1 0 0 0 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
Elem[result, e, 2*esize] = (element1*element2)<2*esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts
the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values
in this instruction are signed integer values.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 0 1 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 1 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean sat;
for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
if neg then
element = -element;
else
element = Abs(element);
(Elem[result, e, esize], sat) = SignedSatQ(element, esize);
if sat then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP
registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 0 0 0 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 0 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer sum;
boolean sat;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
sum = element1 + element2;
(Elem[result, e, esize], sat) = SatQ(sum, esize, unsigned);
if sat then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Doubling Multiply-Add Long (by element). This instruction multiplies each vector element in the
lower or upper half of the first source SIMD&FP register by the specified vector element of the second source
SIMD&FP register, doubles the results, and accumulates the final results with the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQDMLAL instruction extracts vector elements from the lower half of the first source register. The SQDMLAL2
instruction extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 0 0 1 1 H 0 Rn Rd
o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 0 0 1 1 H 0 Rn Rd
o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Va>
00 RESERVED
01 S
10 D
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(idxdsize) operand2 = V[m];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
integer accum;
boolean sat1;
boolean sat2;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the
lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final
results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as
long as the elements that are multiplied.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQDMLAL instruction extracts each source vector from the lower half of each source register. The SQDMLAL2
instruction extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 0 1 0 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 1 0 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
SQDMLAL, SQDMLAL2
Page 1410
(vector)
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
size <Va>
00 RESERVED
01 S
10 D
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
SQDMLAL, SQDMLAL2
Page 1411
(vector)
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
integer accum;
boolean sat1;
boolean sat2;
for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
element2 = SInt(Elem[operand2, e, esize]);
(product, sat1) = SignedSatQ(2 * element1 * element2, 2 * esize);
if sub_op then
accum = SInt(Elem[operand3, e, 2*esize]) - SInt(product);
else
accum = SInt(Elem[operand3, e, 2*esize]) + SInt(product);
(Elem[result, e, 2*esize], sat2) = SignedSatQ(accum, 2 * esize);
if sat1 || sat2 then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SQDMLAL, SQDMLAL2
Page 1412
(vector)
SQDMLSL, SQDMLSL2 (by element)
Signed saturating Doubling Multiply-Subtract Long (by element). This instruction multiplies each vector element in
the lower or upper half of the first source SIMD&FP register by the specified vector element of the second source
SIMD&FP register, doubles the results, and subtracts the final results from the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the
values in this instruction are signed integer values.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQDMLSL instruction extracts vector elements from the lower half of the first source register. The SQDMLSL2
instruction extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 0 1 1 1 H 0 Rn Rd
o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 0 1 1 1 H 0 Rn Rd
o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Va>
00 RESERVED
01 S
10 D
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(idxdsize) operand2 = V[m];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
integer accum;
boolean sat1;
boolean sat2;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in
the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the
final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice
as long as the elements that are multiplied.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQDMLSL instruction extracts each source vector from the lower half of each source register. The SQDMLSL2
instruction extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 1 1 0 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 1 0 0 Rn Rd
o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
SQDMLSL, SQDMLSL2
Page 1416
(vector)
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
size <Va>
00 RESERVED
01 S
10 D
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
SQDMLSL, SQDMLSL2
Page 1417
(vector)
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
integer accum;
boolean sat1;
boolean sat2;
for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
element2 = SInt(Elem[operand2, e, esize]);
(product, sat1) = SignedSatQ(2 * element1 * element2, 2 * esize);
if sub_op then
accum = SInt(Elem[operand3, e, 2*esize]) - SInt(product);
else
accum = SInt(Elem[operand3, e, 2*esize]) + SInt(product);
(Elem[result, e, 2*esize], sat2) = SignedSatQ(accum, 2 * esize);
if sat1 || sat2 then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SQDMLSL, SQDMLSL2
Page 1418
(vector)
SQDMULH (by element)
Signed saturating Doubling Multiply returning High half (by element). This instruction multiplies each vector element
in the first source SIMD&FP register by the specified vector element of the second source SIMD&FP register, doubles
the results, places the most significant half of the final results into a vector, and writes the vector to the destination
SIMD&FP register.
The results are truncated. For rounded results, see SQRDMULH.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 1 1 0 0 H 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 1 0 0 H 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
size <V>
00 RESERVED
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) result;
integer round_const = if round then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer product;
boolean sat;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding
elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results
into a vector, and writes the vector to the destination SIMD&FP register.
The results are truncated. For rounded results, see SQRDMULH.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 1 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' || size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean rounding = (U == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' || size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean rounding = (U == '1');
Assembler Symbols
size <V>
00 RESERVED
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = if rounding then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer product;
boolean sat;
for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
element2 = SInt(Elem[operand2, e, esize]);
product = (2 * element1 * element2) + round_const;
(Elem[result, e, esize], sat) = SignedSatQ(product >> esize, esize);
if sat then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Doubling Multiply Long (by element). This instruction multiplies each vector element in the lower or
upper half of the first source SIMD&FP register by the specified vector element of the second source SIMD&FP
register, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP
register. All the values in this instruction are signed integer values.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQDMULL instruction extracts the first source vector from the lower half of the first source register. The SQDMULL2
instruction extracts the first source vector from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 1 0 1 1 H 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 0 1 1 H 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
size <Va>
00 RESERVED
01 S
10 D
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or
upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes
the vector to the destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQDMULL instruction extracts each source vector from the lower half of each source register. The SQDMULL2
instruction extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 1 0 1 0 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 1 0 1 0 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
SQDMULL, SQDMULL2
Page 1427
(vector)
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
size <Va>
00 RESERVED
01 S
10 D
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
boolean sat;
for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
element2 = SInt(Elem[operand2, e, esize]);
(product, sat) = SignedSatQ(2 * element1 * element2, 2 * esize);
Elem[result, e, 2*esize] = product;
if sat then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SQDMULL, SQDMULL2
Page 1428
(vector)
SQNEG
Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates
each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in
this instruction are signed integer values.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 0 1 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 1 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean sat;
for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
if neg then
element = -element;
else
element = Abs(element);
(Elem[result, e, esize], sat) = SignedSatQ(element, esize);
if sat then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element). This instruction
multiplies the vector elements of the first source SIMD&FP register with the value of a vector element of the second
source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most
significant half of the final results with the vector elements of the destination SIMD&FP register. The results are
rounded.
If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
(FEAT_RDM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 size L M Rm 1 1 0 1 H 0 Rn Rd
S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Vector
(FEAT_RDM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 1 1 0 1 H 0 Rn Rd
S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
size <V>
00 RESERVED
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
integer rounding_const = if rounding then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer element3;
integer product;
integer accum;
boolean sat;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies
the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source
SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant
half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded.
If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
(FEAT_RDM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 0 Rm 1 0 0 0 0 1 Rn Rd
S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' || size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean rounding = TRUE;
boolean sub_op = (S == '1');
Vector
(FEAT_RDM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 0 0 0 0 1 Rn Rd
S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' || size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean rounding = TRUE;
boolean sub_op = (S == '1');
Assembler Symbols
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
integer rounding_const = if rounding then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer element3;
integer product;
integer accum;
boolean sat;
for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
element2 = SInt(Elem[operand2, e, esize]);
element3 = SInt(Elem[operand3, e, esize]);
if sub_op then
accum = ((element3 << esize) - 2 * (element1 * element2) + rounding_const);
else
accum = ((element3 << esize) + 2 * (element1 * element2) + rounding_const);
(Elem[result, e, esize], sat) = SignedSatQ(accum >> esize, esize);
if sat then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element). This instruction multiplies
the vector elements of the first source SIMD&FP register with the value of a vector element of the second source
SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half
of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
(FEAT_RDM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 size L M Rm 1 1 1 1 H 0 Rn Rd
S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Vector
(FEAT_RDM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 1 1 1 1 H 0 Rn Rd
S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
size <V>
00 RESERVED
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
integer rounding_const = if rounding then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer element3;
integer product;
integer accum;
boolean sat;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the
vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source
SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half
of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
(FEAT_RDM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 0 Rm 1 0 0 0 1 1 Rn Rd
S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' || size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean rounding = TRUE;
boolean sub_op = (S == '1');
Vector
(FEAT_RDM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 0 0 0 1 1 Rn Rd
S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' || size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean rounding = TRUE;
boolean sub_op = (S == '1');
Assembler Symbols
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
integer rounding_const = if rounding then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer element3;
integer product;
integer accum;
boolean sat;
for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
element2 = SInt(Elem[operand2, e, esize]);
element3 = SInt(Elem[operand3, e, esize]);
if sub_op then
accum = ((element3 << esize) - 2 * (element1 * element2) + rounding_const);
else
accum = ((element3 << esize) + 2 * (element1 * element2) + rounding_const);
(Elem[result, e, esize], sat) = SignedSatQ(accum >> esize, esize);
if sat then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Rounding Doubling Multiply returning High half (by element). This instruction multiplies each
vector element in the first source SIMD&FP register by the specified vector element of the second source SIMD&FP
register, doubles the results, places the most significant half of the final results into a vector, and writes the vector to
the destination SIMD&FP register.
The results are rounded. For truncated results, see SQDMULH.
If any of the results overflows, they are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 1 1 0 1 H 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 1 0 1 H 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
size <V>
00 RESERVED
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) result;
integer round_const = if round then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer product;
boolean sat;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of
corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of
the final results into a vector, and writes the vector to the destination SIMD&FP register.
The results are rounded. For truncated results, see SQDMULH.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 1 0 1 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' || size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean rounding = (U == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 1 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' || size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean rounding = (U == '1');
Assembler Symbols
size <V>
00 RESERVED
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = if rounding then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer product;
boolean sat;
for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
element2 = SInt(Elem[operand2, e, esize]);
product = (2 * element1 * element2) + round_const;
(Elem[result, e, esize], sat) = SignedSatQ(product >> esize, esize);
if sat then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source
SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the
second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP
register.
If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded. For
truncated results, see SQSHL.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 1 0 1 1 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
if S == '0' && size != '11' then UNDEFINED;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 1 1 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
Assembler Symbols
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = 0;
integer shift;
integer element;
boolean sat;
for e = 0 to elements-1
shift = SInt(Elem[operand2, e, esize]<7:0>);
if rounding then
round_const = 1 << (-shift - 1); // 0 for left shift, 2^(n-1) for right shift
element = (Int(Elem[operand1, e, esize], unsigned) + round_const) << shift;
if saturating then
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';
else
Elem[result, e, esize] = element<esize-1:0>;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Rounded Shift Right Narrow (immediate). This instruction reads each vector element in the source
SIMD&FP register, right shifts each result by an immediate value, saturates each shifted result to a value that is half
the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the destination
SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are half
as long as the source vector elements. The results are rounded. For truncated results, see SQSHRN.
The SQRSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQRSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 1 0 0 1 1 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 0 1 1 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to the destination operand width in bits,
encoded in “immh:immb”:
immh <shift>
0000 RESERVED
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
boolean sat;
for e = 0 to elements-1
element = (Int(Elem[operand, e, 2*esize], unsigned) + round_const) >> shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Rounded Shift Right Unsigned Narrow (immediate). This instruction reads each signed integer value
in the vector of the source SIMD&FP register, right shifts each value by an immediate value, saturates the result to an
unsigned integer value that is half the original width, places the final result into a vector, and writes the vector to the
destination SIMD&FP register. The results are rounded. For truncated results, see SQSHRUN.
The SQRSHRUN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQRSHRUN2 instruction writes the vector to the upper half of the destination register without affecting the other
bits of the register.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 0 0 0 1 1 Rn Rd
immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 0 0 1 1 Rn Rd
immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to the destination operand width in bits,
encoded in “immh:immb”:
immh <shift>
0000 RESERVED
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED
For the vector variant: is the right shift amount, in the range 1 to the destination element width in bits,
encoded in “immh:immb”:
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
boolean sat;
for e = 0 to elements-1
element = (SInt(Elem[operand, e, 2*esize]) + round_const) >> shift;
(Elem[result, e, esize], sat) = UnsignedSatQ(element, esize);
if sat then FPSR.QC = '1';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Shift Left (immediate). This instruction reads each vector element in the source SIMD&FP register,
shifts each result by an immediate value, places the final result in a vector, and writes the vector to the destination
SIMD&FP register. The results are truncated. For rounded results, see UQRSHL.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 1 1 1 0 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
boolean src_unsigned;
boolean dst_unsigned;
case op:U of
when '00' UNDEFINED;
when '01' src_unsigned = FALSE; dst_unsigned = TRUE;
when '10' src_unsigned = FALSE; dst_unsigned = FALSE;
when '11' src_unsigned = TRUE; dst_unsigned = TRUE;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 1 1 1 0 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
boolean src_unsigned;
boolean dst_unsigned;
case op:U of
when '00' UNDEFINED;
when '01' src_unsigned = FALSE; dst_unsigned = TRUE;
when '10' src_unsigned = FALSE; dst_unsigned = FALSE;
when '11' src_unsigned = TRUE; dst_unsigned = TRUE;
immh <V>
0000 RESERVED
0001 B
001x H
01xx S
1xxx D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the left shift amount, in the range 0 to the operand width in bits minus 1,
encoded in “immh:immb”:
immh <shift>
0000 RESERVED
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)
For the vector variant: is the left shift amount, in the range 0 to the element width in bits minus 1,
encoded in “immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean sat;
for e = 0 to elements-1
element = Int(Elem[operand, e, esize], src_unsigned) << shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, dst_unsigned);
if sat then FPSR.QC = '1';
V[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP
register, shifts each element by a value from the least significant byte of the corresponding element of the second
source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. For
rounded results, see SQRSHL.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 1 0 0 1 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
if S == '0' && size != '11' then UNDEFINED;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 0 1 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = 0;
integer shift;
integer element;
boolean sat;
for e = 0 to elements-1
shift = SInt(Elem[operand2, e, esize]<7:0>);
if rounding then
round_const = 1 << (-shift - 1); // 0 for left shift, 2^(n-1) for right shift
element = (Int(Elem[operand1, e, esize], unsigned) + round_const) << shift;
if saturating then
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';
else
Elem[result, e, esize] = element<esize-1:0>;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Shift Left Unsigned (immediate). This instruction reads each signed integer value in the vector of
the source SIMD&FP register, shifts each value by an immediate value, saturates the shifted result to an unsigned
integer value, places the result in a vector, and writes the vector to the destination SIMD&FP register. The results are
truncated. For rounded results, see UQRSHL.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 1 1 0 0 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
boolean src_unsigned;
boolean dst_unsigned;
case op:U of
when '00' UNDEFINED;
when '01' src_unsigned = FALSE; dst_unsigned = TRUE;
when '10' src_unsigned = FALSE; dst_unsigned = FALSE;
when '11' src_unsigned = TRUE; dst_unsigned = TRUE;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 1 1 0 0 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
boolean src_unsigned;
boolean dst_unsigned;
case op:U of
when '00' UNDEFINED;
when '01' src_unsigned = FALSE; dst_unsigned = TRUE;
when '10' src_unsigned = FALSE; dst_unsigned = FALSE;
when '11' src_unsigned = TRUE; dst_unsigned = TRUE;
immh <V>
0000 RESERVED
0001 B
001x H
01xx S
1xxx D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the left shift amount, in the range 0 to the operand width in bits minus 1,
encoded in “immh:immb”:
immh <shift>
0000 RESERVED
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)
For the vector variant: is the left shift amount, in the range 0 to the element width in bits minus 1,
encoded in “immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean sat;
for e = 0 to elements-1
element = Int(Elem[operand, e, esize], src_unsigned) << shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, dst_unsigned);
if sat then FPSR.QC = '1';
V[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Shift Right Narrow (immediate). This instruction reads each vector element in the source SIMD&FP
register, right shifts and truncates each result by an immediate value, saturates each shifted result to a value that is
half the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the
destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector
elements are half as long as the source vector elements. For rounded results, see SQRSHRN.
The SQSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 1 0 0 1 0 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 0 1 0 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to the destination operand width in bits,
encoded in “immh:immb”:
immh <shift>
0000 RESERVED
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
boolean sat;
for e = 0 to elements-1
element = (Int(Elem[operand, e, 2*esize], unsigned) + round_const) >> shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Shift Right Unsigned Narrow (immediate). This instruction reads each signed integer value in the
vector of the source SIMD&FP register, right shifts each value by an immediate value, saturates the result to an
unsigned integer value that is half the original width, places the final result into a vector, and writes the vector to the
destination SIMD&FP register. The results are truncated. For rounded results, see SQRSHRUN.
The SQSHRUN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQSHRUN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 0 0 0 0 1 Rn Rd
immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 0 0 0 1 Rn Rd
immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to the destination operand width in bits,
encoded in “immh:immb”:
immh <shift>
0000 RESERVED
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED
For the vector variant: is the right shift amount, in the range 1 to the destination element width in bits,
encoded in “immh:immb”:
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
boolean sat;
for e = 0 to elements-1
element = (SInt(Elem[operand, e, 2*esize]) + round_const) >> shift;
(Elem[result, e, esize], sat) = UnsignedSatQ(element, esize);
if sat then FPSR.QC = '1';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register
from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and
writes the vector to the destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 0 1 0 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 0 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer diff;
boolean sat;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
diff = element1 - element2;
(Elem[result, e, esize], sat) = SatQ(diff, esize, unsigned);
if sat then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register,
saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or
upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector
elements. All the values in this instruction are signed integer values.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQXTN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
SQXTN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 1 0 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 1 0 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size <Vb>
00 B
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
size <Va>
00 H
01 S
10 D
11 RESERVED
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand = V[n];
bits(datasize) result;
bits(2*esize) element;
boolean sat;
for e = 0 to elements-1
element = Elem[operand, e, 2*esize];
(Elem[result, e, esize], sat) = SatQ(Int(element, unsigned), esize, unsigned);
if sat then FPSR.QC = '1';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the
source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the
result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The
destination vector elements are half as long as the source vector elements.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
The SQXTUN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQXTUN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 1 0 0 1 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 1 0 0 1 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size <Vb>
00 B
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
size <Va>
00 H
01 S
10 D
11 RESERVED
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand = V[n];
bits(datasize) result;
bits(2*esize) element;
boolean sat;
for e = 0 to elements-1
element = Elem[operand, e, 2*esize];
(Elem[result, e, esize], sat) = UnsignedSatQ(SInt(element), esize);
if sat then FPSR.QC = '1';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source
SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the
destination SIMD&FP register.
The results are rounded. For truncated results, see SHADD.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
Elem[result, e, esize] = (element1+element2+1)<esize:1>;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift Right and Insert (immediate). This instruction reads each vector element in the source SIMD&FP register, right
shifts each vector element by an immediate value, and inserts the result into the corresponding vector element in the
destination SIMD&FP register such that the new zero bits created by the shift are not inserted but retain their existing
value. Bits shifted out of the right of each vector element of the source register are lost.
The following figure shows an example of the operation of shift right by 3 for an 8-bit vector element.
63 56 55 0
Vn.B[7]
63 56 55 0
Vd.B[7] after operation
63 56 55 0
Vd.B[7] before operation
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 1 0 0 0 1 Rn Rd
immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 1 0 0 0 1 Rn Rd
immh
integer d = UInt(Rd);
integer n = UInt(Rn);
immh <V>
0xxx RESERVED
1xxx D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))
For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2 = V[d];
bits(datasize) result;
bits(esize) mask = LSR(Ones(esize), shift);
bits(esize) shifted;
for e = 0 to elements-1
shifted = LSR(Elem[operand, e, esize], shift);
Elem[result, e, esize] = (Elem[operand2, e, esize] AND NOT(mask)) OR shifted;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source
SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second
source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift. For a
truncating shift, see SSHL.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 1 0 1 0 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
if S == '0' && size != '11' then UNDEFINED;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 1 0 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = 0;
integer shift;
integer element;
boolean sat;
for e = 0 to elements-1
shift = SInt(Elem[operand2, e, esize]<7:0>);
if rounding then
round_const = 1 << (-shift - 1); // 0 for left shift, 2^(n-1) for right shift
element = (Int(Elem[operand1, e, esize], unsigned) + round_const) << shift;
if saturating then
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';
else
Elem[result, e, esize] = element<esize-1:0>;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Rounding Shift Right (immediate). This instruction reads each vector element in the source SIMD&FP register,
right shifts each result by an immediate value, places the final result into a vector, and writes the vector to the
destination SIMD&FP register. All the values in this instruction are signed integer values. The results are rounded. For
truncated results, see SSHR.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 0 1 0 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 0 1 0 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))
For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Rounding Shift Right and Accumulate (immediate). This instruction reads each vector element in the source
SIMD&FP register, right shifts each result by an immediate value, and accumulates the final results with the vector
elements of the destination SIMD&FP register. All the values in this instruction are signed integer values. The results
are rounded. For truncated results, see SSRA.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 0 1 1 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 0 1 1 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))
For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP
register, shifts each value by a value from the least significant byte of the corresponding element of the second source
SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift. For a
rounding shift, see SRSHL.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 1 0 0 0 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
if S == '0' && size != '11' then UNDEFINED;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 0 0 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = 0;
integer shift;
integer element;
boolean sat;
for e = 0 to elements-1
shift = SInt(Elem[operand2, e, esize]<7:0>);
if rounding then
round_const = 1 << (-shift - 1); // 0 for left shift, 2^(n-1) for right shift
element = (Int(Elem[operand1, e, esize], unsigned) + round_const) << shift;
if saturating then
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';
else
Elem[result, e, esize] = element<esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Shift Left Long (immediate). This instruction reads each vector element from the source SIMD&FP register,
left shifts each vector element by the specified shift amount, places the result into a vector, and writes the vector to
the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
All the values in this instruction are signed integer values.
The SSHLL instruction extracts vector elements from the lower half of the source register. The SSHLL2 instruction
extracts vector elements from the upper half of the source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias SXTL, SXTL2.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 1 0 0 1 Rn Rd
U immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> Is the left shift amount, in the range 0 to the source element width in bits minus 1, encoded in
“immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx RESERVED
Alias Conditions
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = Vpart[n, part];
bits(datasize*2) result;
integer element;
for e = 0 to elements-1
element = Int(Elem[operand, e, esize], unsigned) << shift;
Elem[result, e, 2*esize] = element<2*esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Shift Right (immediate). This instruction reads each vector element in the source SIMD&FP register, right
shifts each result by an immediate value, places the final result into a vector, and writes the vector to the destination
SIMD&FP register. All the values in this instruction are signed integer values. The results are truncated. For rounded
results, see SRSHR.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 0 0 0 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 0 0 0 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))
For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD&FP
register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of
the destination SIMD&FP register. All the values in this instruction are signed integer values. The results are
truncated. For rounded results, see SRSRA.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 0 0 1 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 0 0 1 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))
For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source
SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into
a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed
integer values. The destination vector elements are twice as long as the source vector elements.
The SSUBL instruction extracts each source vector from the lower half of each source register. The SSUBL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 0 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
Elem[result, e, 2*esize] = sum<2*esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source
SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a
vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer
values.
The SSUBW instruction extracts the second source vector from the lower half of the second source register. The SSUBW2
instruction extracts the second source vector from the upper half of the second source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 1 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand1 = V[n];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, 2*esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
Elem[result, e, 2*esize] = sum<2*esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store multiple single-element structures from one, two, three, or four registers. This instruction stores elements to
memory from one, two, three, or four SIMD&FP registers, without interleaving. Every element of each register is
stored.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 0 0 0 0 0 0 0 x x 1 x size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 0 0 Rm x x 1 x size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<Vt4> Is the name of the fourth SIMD&FP register to be transferred, encoded as "Rt" plus 3 modulo 32.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Q <imm>
0 #8
1 #16
For the two registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:
Q <imm>
0 #16
1 #32
For the three registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:
Q <imm>
0 #24
1 #48
For the four registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:
Q <imm>
0 #32
1 #64
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
Shared Decode
case opcode of
when '0000' rpt = 1; selem = 4; // LD/ST4 (4 registers)
when '0010' rpt = 4; selem = 1; // LD/ST1 (4 registers)
when '0100' rpt = 1; selem = 3; // LD/ST3 (3 registers)
when '0110' rpt = 3; selem = 1; // LD/ST1 (3 registers)
when '0111' rpt = 1; selem = 1; // LD/ST1 (1 register)
when '1000' rpt = 1; selem = 2; // LD/ST2 (2 registers)
when '1010' rpt = 2; selem = 1; // LD/ST1 (2 registers)
otherwise UNDEFINED;
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
for e = 0 to elements-1
tt = (t + r) MOD 32;
for s = 0 to selem-1
rval = V[tt];
if memop == MemOp_LOAD then
Elem[rval, e, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[tt] = rval;
else // memop == MemOp_STORE
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, e, esize];
offs = offs + ebytes;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store a single-element structure from one lane of one register. This instruction stores the specified element of a
SIMD&FP register to memory.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 0 0 0 0 0 0 0 x x 0 S size Rn Rt
L R opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 0 0 Rm x x 0 S size Rn Rt
L R opcode
16-bit, immediate offset (Rm == 11111 && opcode == 010 && size == x0)
16-bit, register offset (Rm != 11111 && opcode == 010 && size == x0)
32-bit, immediate offset (Rm == 11111 && opcode == 100 && size == 00)
32-bit, register offset (Rm != 11111 && opcode == 100 && size == 00)
64-bit, immediate offset (Rm == 11111 && opcode == 100 && S == 0 && size == 01)
64-bit, register offset (Rm != 11111 && opcode == 100 && S == 0 && size == 01)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<1> == '1' then UNDEFINED;
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
// load and replicate to all elements
for s = 0 to selem-1
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
offs = offs + ebytes;
t = (t + 1) MOD 32;
else
// load/store one element per register
for s = 0 to selem-1
rval = V[t];
if memop == MemOp_LOAD then
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
else // memop == MemOp_STORE
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
offs = offs + ebytes;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store multiple 2-element structures from two registers. This instruction stores multiple 2-element structures from two
SIMD&FP registers to memory, with interleaving. Every element of each register is stored.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 0 0 Rm 1 0 0 0 size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Q <imm>
0 #16
1 #32
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
Shared Decode
case opcode of
when '0000' rpt = 1; selem = 4; // LD/ST4 (4 registers)
when '0010' rpt = 4; selem = 1; // LD/ST1 (4 registers)
when '0100' rpt = 1; selem = 3; // LD/ST3 (3 registers)
when '0110' rpt = 3; selem = 1; // LD/ST1 (3 registers)
when '0111' rpt = 1; selem = 1; // LD/ST1 (1 register)
when '1000' rpt = 1; selem = 2; // LD/ST2 (2 registers)
when '1010' rpt = 2; selem = 1; // LD/ST1 (2 registers)
otherwise UNDEFINED;
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
for e = 0 to elements-1
tt = (t + r) MOD 32;
for s = 0 to selem-1
rval = V[tt];
if memop == MemOp_LOAD then
Elem[rval, e, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[tt] = rval;
else // memop == MemOp_STORE
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, e, esize];
offs = offs + ebytes;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store single 2-element structure from one lane of two registers. This instruction stores a 2-element structure to
memory from corresponding elements of two SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 0 1 0 0 0 0 0 x x 0 S size Rn Rt
L R opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 0 1 Rm x x 0 S size Rn Rt
L R opcode
16-bit, immediate offset (Rm == 11111 && opcode == 010 && size == x0)
16-bit, register offset (Rm != 11111 && opcode == 010 && size == x0)
32-bit, immediate offset (Rm == 11111 && opcode == 100 && size == 00)
32-bit, register offset (Rm != 11111 && opcode == 100 && size == 00)
64-bit, immediate offset (Rm == 11111 && opcode == 100 && S == 0 && size == 01)
64-bit, register offset (Rm != 11111 && opcode == 100 && S == 0 && size == 01)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<1> == '1' then UNDEFINED;
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
// load and replicate to all elements
for s = 0 to selem-1
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
offs = offs + ebytes;
t = (t + 1) MOD 32;
else
// load/store one element per register
for s = 0 to selem-1
rval = V[t];
if memop == MemOp_LOAD then
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
else // memop == MemOp_STORE
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
offs = offs + ebytes;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store multiple 3-element structures from three registers. This instruction stores multiple 3-element structures to
memory from three SIMD&FP registers, with interleaving. Every element of each register is stored.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 0 0 Rm 0 1 0 0 size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
Q <imm>
0 #24
1 #48
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
Shared Decode
case opcode of
when '0000' rpt = 1; selem = 4; // LD/ST4 (4 registers)
when '0010' rpt = 4; selem = 1; // LD/ST1 (4 registers)
when '0100' rpt = 1; selem = 3; // LD/ST3 (3 registers)
when '0110' rpt = 3; selem = 1; // LD/ST1 (3 registers)
when '0111' rpt = 1; selem = 1; // LD/ST1 (1 register)
when '1000' rpt = 1; selem = 2; // LD/ST2 (2 registers)
when '1010' rpt = 2; selem = 1; // LD/ST1 (2 registers)
otherwise UNDEFINED;
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
for e = 0 to elements-1
tt = (t + r) MOD 32;
for s = 0 to selem-1
rval = V[tt];
if memop == MemOp_LOAD then
Elem[rval, e, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[tt] = rval;
else // memop == MemOp_STORE
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, e, esize];
offs = offs + ebytes;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store single 3-element structure from one lane of three registers. This instruction stores a 3-element structure to
memory from corresponding elements of three SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 0 0 0 0 0 0 0 x x 1 S size Rn Rt
L R opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 0 0 Rm x x 1 S size Rn Rt
L R opcode
16-bit, immediate offset (Rm == 11111 && opcode == 011 && size == x0)
16-bit, register offset (Rm != 11111 && opcode == 011 && size == x0)
32-bit, immediate offset (Rm == 11111 && opcode == 101 && size == 00)
32-bit, register offset (Rm != 11111 && opcode == 101 && size == 00)
64-bit, immediate offset (Rm == 11111 && opcode == 101 && S == 0 && size == 01)
64-bit, register offset (Rm != 11111 && opcode == 101 && S == 0 && size == 01)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<1> == '1' then UNDEFINED;
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
// load and replicate to all elements
for s = 0 to selem-1
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
offs = offs + ebytes;
t = (t + 1) MOD 32;
else
// load/store one element per register
for s = 0 to selem-1
rval = V[t];
if memop == MemOp_LOAD then
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
else // memop == MemOp_STORE
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
offs = offs + ebytes;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store multiple 4-element structures from four registers. This instruction stores multiple 4-element structures to
memory from four SIMD&FP registers, with interleaving. Every element of each register is stored.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 0 0 Rm 0 0 0 0 size Rn Rt
L opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
Q <imm>
0 #32
1 #64
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
Shared Decode
case opcode of
when '0000' rpt = 1; selem = 4; // LD/ST4 (4 registers)
when '0010' rpt = 4; selem = 1; // LD/ST1 (4 registers)
when '0100' rpt = 1; selem = 3; // LD/ST3 (3 registers)
when '0110' rpt = 3; selem = 1; // LD/ST1 (3 registers)
when '0111' rpt = 1; selem = 1; // LD/ST1 (1 register)
when '1000' rpt = 1; selem = 2; // LD/ST2 (2 registers)
when '1010' rpt = 2; selem = 1; // LD/ST1 (2 registers)
otherwise UNDEFINED;
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
for r = 0 to rpt-1
for e = 0 to elements-1
tt = (t + r) MOD 32;
for s = 0 to selem-1
rval = V[tt];
if memop == MemOp_LOAD then
Elem[rval, e, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[tt] = rval;
else // memop == MemOp_STORE
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, e, esize];
offs = offs + ebytes;
tt = (tt + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store single 4-element structure from one lane of four registers. This instruction stores a 4-element structure to
memory from corresponding elements of four SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index
No offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 0 1 0 0 0 0 0 x x 1 S size Rn Rt
L R opcode
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 0 1 Rm x x 1 S size Rn Rt
L R opcode
16-bit, immediate offset (Rm == 11111 && opcode == 011 && size == x0)
16-bit, register offset (Rm != 11111 && opcode == 011 && size == x0)
32-bit, immediate offset (Rm == 11111 && opcode == 101 && size == 00)
32-bit, register offset (Rm != 11111 && opcode == 101 && size == 00)
64-bit, immediate offset (Rm == 11111 && opcode == 101 && S == 0 && size == 01)
64-bit, register offset (Rm != 11111 && opcode == 101 && S == 0 && size == 01)
integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;
Assembler Symbols
<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<Vt4> Is the name of the fourth SIMD&FP register to be transferred, encoded as "Rt" plus 3 modulo 32.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.
case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<1> == '1' then UNDEFINED;
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
offs = Zeros();
if replicate then
// load and replicate to all elements
for s = 0 to selem-1
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
offs = offs + ebytes;
t = (t + 1) MOD 32;
else
// load/store one element per register
for s = 0 to selem-1
rval = V[t];
if memop == MemOp_LOAD then
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
else // memop == MemOp_STORE
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
offs = offs + ebytes;
t = (t + 1) MOD 32;
if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Pair of SIMD&FP registers, with Non-temporal hint. This instruction stores a pair of SIMD&FP registers to
memory, issuing a hint to the memory system that the access is non-temporal. The address used for the store is
calculated from an address from a base register value and an immediate offset. For information about non-temporal
pair instructions, see Load/Store SIMD and Floating-point Non-temporal pair.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 0 0 0 imm7 Rt2 Rn Rt
L
// Empty.
Assembler Symbols
<Dt1> Is the 64-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt2> Is the 64-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<Qt1> Is the 128-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt2> Is the 128-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<St1> Is the 32-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<St2> Is the 32-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the 32-bit variant: is the optional signed immediate byte offset, a multiple of 4 in the range -256 to
252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit variant: is the optional signed immediate byte offset, a multiple of 8 in the range -512 to
504, defaulting to 0 and encoded in the "imm7" field as <imm>/8.
For the 128-bit variant: is the optional signed immediate byte offset, a multiple of 16 in the range -1024
to 1008, defaulting to 0 and encoded in the "imm7" field as <imm>/16.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if opc == '11' then UNDEFINED;
integer scale = 2 + UInt(opc);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = n != 31;
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
data1 = V[t];
data2 = V[t2];
Mem[address, dbytes, AccType_VECSTREAM] = data1;
Mem[address+dbytes, dbytes, AccType_VECSTREAM] = data2;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store Pair of SIMD&FP registers. This instruction stores a pair of SIMD&FP registers to memory. The address used for
the store is calculated from a base register value and an immediate offset.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 0 1 0 imm7 Rt2 Rn Rt
L
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 1 1 0 imm7 Rt2 Rn Rt
L
Signed offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 1 0 0 imm7 Rt2 Rn Rt
L
Assembler Symbols
<Dt1> Is the 64-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt2> Is the 64-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<Qt1> Is the 128-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt2> Is the 128-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<St1> Is the 32-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<St2> Is the 32-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the 32-bit post-index and 32-bit pre-index variant: is the signed immediate byte offset, a multiple of
4 in the range -256 to 252, encoded in the "imm7" field as <imm>/4.
For the 32-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 4 in the
range -256 to 252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit post-index and 64-bit pre-index variant: is the signed immediate byte offset, a multiple of
8 in the range -512 to 504, encoded in the "imm7" field as <imm>/8.
For the 64-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 8 in the
range -512 to 504, defaulting to 0 and encoded in the "imm7" field as <imm>/8.
For the 128-bit post-index and 128-bit pre-index variant: is the signed immediate byte offset, a multiple
of 16 in the range -1024 to 1008, encoded in the "imm7" field as <imm>/16.
For the 128-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 16 in the
range -1024 to 1008, defaulting to 0 and encoded in the "imm7" field as <imm>/16.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if opc == '11' then UNDEFINED;
integer scale = 2 + UInt(opc);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = wback || n != 31;
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
data1 = V[t];
data2 = V[t2];
Mem[address, dbytes, AccType_VEC] = data1;
Mem[address+dbytes, dbytes, AccType_VEC] = data2;
if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store SIMD&FP register (immediate offset). This instruction stores a single SIMD&FP register to memory. The
address that is used for the store is calculated from a base register value and an immediate offset.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset
Post-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 0 0 imm9 0 1 Rn Rt
opc
Pre-index
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 0 0 imm9 1 1 Rn Rt
opc
Unsigned offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 1 x 0 imm12 Rn Rt
opc
<Bt> Is the 8-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt> Is the 64-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Ht> Is the 16-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt> Is the 128-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<St> Is the 32-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> For the 8-bit variant: is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to
0 and encoded in the "imm12" field.
For the 16-bit variant: is the optional positive immediate byte offset, a multiple of 2 in the range 0 to
8190, defaulting to 0 and encoded in the "imm12" field as <pimm>/2.
For the 32-bit variant: is the optional positive immediate byte offset, a multiple of 4 in the range 0 to
16380, defaulting to 0 and encoded in the "imm12" field as <pimm>/4.
For the 64-bit variant: is the optional positive immediate byte offset, a multiple of 8 in the range 0 to
32760, defaulting to 0 and encoded in the "imm12" field as <pimm>/8.
For the 128-bit variant: is the optional positive immediate byte offset, a multiple of 16 in the range 0 to
65520, defaulting to 0 and encoded in the "imm12" field as <pimm>/16.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop = if opc<0> == '1' then MemOp_LOAD else MemOp_STORE;
integer datasize = 8 << scale;
boolean tag_checked = memop != MemOp_PREFETCH && (wback || n != 31);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
if !postindex then
address = address + offset;
case memop of
when MemOp_STORE
data = V[t];
Mem[address, datasize DIV 8, AccType_VEC] = data;
when MemOp_LOAD
data = Mem[address, datasize DIV 8, AccType_VEC];
V[t] = data;
if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store SIMD&FP register (register offset). This instruction stores a single SIMD&FP register to memory. The address
that is used for the store is calculated from a base register value and an offset register value. The offset can be
optionally shifted and extended.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 0 1 Rm option S 1 0 Rn Rt
opc
Assembler Symbols
<Bt> Is the 8-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt> Is the 64-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Ht> Is the 16-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt> Is the 128-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<St> Is the 32-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.
option <extend>
010 UXTW
110 SXTW
111 SXTX
For the 128-bit, 16-bit, 32-bit and 64-bit variant: is the index extend/shift specifier, defaulting to LSL,
and which must be omitted for the LSL option when <amount> is omitted. encoded in “option”:
option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX
<amount> For the 8-bit variant: is the index shift amount, it must be #0, encoded in "S" as 0 if omitted, or as 1 if
present.
For the 16-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #1
For the 32-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #2
For the 64-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #3
For the 128-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:
S <amount>
0 #0
1 #4
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
MemOp memop = if opc<0> == '1' then MemOp_LOAD else MemOp_STORE;
integer datasize = 8 << scale;
boolean tag_checked = memop != MemOp_PREFETCH;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = V[t];
Mem[address, datasize DIV 8, AccType_VEC] = data;
when MemOp_LOAD
data = Mem[address, datasize DIV 8, AccType_VEC];
V[t] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store SIMD&FP register (unscaled offset). This instruction stores a single SIMD&FP register to memory. The address
that is used for the store is calculated from a base register value and an optional immediate offset.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 0 0 imm9 0 0 Rn Rt
opc
Assembler Symbols
<Bt> Is the 8-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt> Is the 64-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Ht> Is the 16-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt> Is the 128-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<St> Is the 32-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.
Shared Decode
integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop = if opc<0> == '1' then MemOp_LOAD else MemOp_STORE;
integer datasize = 8 << scale;
boolean tag_checked = memop != MemOp_PREFETCH && (n != 31);
CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data;
if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
case memop of
when MemOp_STORE
data = V[t];
Mem[address, datasize DIV 8, AccType_VEC] = data;
when MemOp_LOAD
data = Mem[address, datasize DIV 8, AccType_VEC];
V[t] = data;
Operational information
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the
corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the
vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 1 0 0 0 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size != '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean sub_op = (U == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 0 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean sub_op = (U == '1');
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if sub_op then
Elem[result, e, esize] = element1 - element2;
else
Elem[result, e, esize] = element1 + element2;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP
register from the corresponding vector element in the first source SIMD&FP register, places the most significant half
of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the
values in this instruction are signed integer values.
The results are truncated. For rounded results, see RSUBHN.
The SUBHN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
SUBHN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 0 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand1 = V[n];
bits(2*datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = if round then 1 << (esize - 1) else 0;
bits(2*esize) element1;
bits(2*esize) element2;
bits(2*esize) sum;
for e = 0 to elements-1
element1 = Elem[operand1, e, 2*esize];
element2 = Elem[operand2, e, 2*esize];
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
sum = sum + round_const;
Elem[result, e, esize] = sum<2*esize-1:esize>;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Dot product index form with signed and unsigned integers. This instruction performs the dot product of the four
signed 8-bit integer values in each 32-bit element of the first source register with the four unsigned 8-bit integer
values in an indexed 32-bit element of the second source register, accumulating the result into the corresponding
32-bit element of the destination vector.
From Armv8.2 to Armv8.5, this is an OPTIONAL instruction. From Armv8.6 it is mandatory for implementations that
include Advanced SIMD to support it. ID_AA64ISAR1_EL1.I8MM indicates whether this instruction is supported.
Vector
(FEAT_I8MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 L M Rm 1 1 1 1 H 0 Rn Rd
US
Assembler Symbols
<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.
Q <Ta>
0 2S
1 4S
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Q <Tb>
0 8B
1 16B
<Vm> Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.
<index> Is the immediate index of a quadtuplet of four 8-bit elements in the range 0 to 3, encoded in the "H:L"
fields.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(128) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
for e = 0 to elements-1
bits(32) res = Elem[operand3, e, 32];
for b = 0 to 3
integer element1 = Int(Elem[operand1, 4*e+b, 8], op1_unsigned);
integer element2 = Int(Elem[operand2, 4*i+b, 8], op2_unsigned);
res = res + element1 * element2;
Elem[result, e, 32] = res;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector
elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the
destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
for e = 0 to elements-1
op1 = Int(Elem[operand, e, esize], !unsigned);
op2 = Int(Elem[operand2, e, esize], unsigned);
(Elem[result, e, esize], sat) = SatQ(op1 + op2, esize, unsigned);
if sat then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed extend Long. This instruction duplicates each vector element in the lower or upper half of the source
SIMD&FP register into a vector, and writes the vector to the destination SIMD&FP register. The destination vector
elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.
The SXTL instruction extracts the source vector from the lower half of the source register. The SXTL2 instruction
extracts the source vector from the upper half of the source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
• The encodings in this description are named to match the encodings of SSHLL, SSHLL2.
• The description of SSHLL, SSHLL2 gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 0 0 0 1 0 1 0 0 1 Rn Rd
U immh immb
is equivalent to
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
The description of SSHLL, SSHLL2 gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP
register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source
table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP
register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is
used to describe the table, the first source register describes the lowest bytes of the table.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 Rm 0 len 0 0 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <Ta>
0 8B
1 16B
<Vn> For the four register table, three register table and two register table variant: is the name of the first
SIMD&FP table register, encoded in the "Rn" field.
For the single register table variant: is the name of the SIMD&FP table register, encoded in the "Rn"
field.
<Vn+1> Is the name of the second SIMD&FP table register, encoded as "Rn" plus 1 modulo 32.
<Vn+2> Is the name of the third SIMD&FP table register, encoded as "Rn" plus 2 modulo 32.
<Vn+3> Is the name of the fourth SIMD&FP table register, encoded as "Rn" plus 3 modulo 32.
<Vm> Is the name of the SIMD&FP index register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) indices = V[m];
bits(128*regs) table = Zeros();
bits(datasize) result;
integer index;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Table vector lookup extension. This instruction reads each value from the vector elements in the index source
SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four
source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination
SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination
register is left unchanged. If more than one source register is used to describe the table, the first source register
describes the lowest bytes of the table.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 Rm 0 len 1 0 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <Ta>
0 8B
1 16B
<Vn> For the four register table, three register table and two register table variant: is the name of the first
SIMD&FP table register, encoded in the "Rn" field.
For the single register table variant: is the name of the SIMD&FP table register, encoded in the "Rn"
field.
<Vn+1> Is the name of the second SIMD&FP table register, encoded as "Rn" plus 1 modulo 32.
<Vn+2> Is the name of the third SIMD&FP table register, encoded as "Rn" plus 2 modulo 32.
<Vn+3> Is the name of the fourth SIMD&FP table register, encoded as "Rn" plus 3 modulo 32.
<Vm> Is the name of the SIMD&FP index register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) indices = V[m];
bits(128*regs) table = Zeros();
bits(datasize) result;
integer index;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two
source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the
vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-
numbered elements of the destination vector, starting at zero, while vector elements from the second source register
are placed into odd-numbered elements of the destination vector.
Note
Vn Vn
3 2 1 0 3 2 1 0
Vd Vd
Vm Vm
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 0 0 1 0 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
for p = 0 to pairs-1
Elem[result, 2*p+0, esize] = Elem[operand1, 2*p+part, esize];
Elem[result, 2*p+1, esize] = Elem[operand2, 2*p+part, esize];
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two
source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the
destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements
of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-
numbered elements of the destination vector.
Note
Vn Vn
3 2 1 0 3 2 1 0
Vd Vd
Vm Vm
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 0 1 1 0 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
for p = 0 to pairs-1
Elem[result, 2*p+0, esize] = Elem[operand1, 2*p+part, esize];
Elem[result, 2*p+1, esize] = Elem[operand2, 2*p+part, esize];
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second
source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the
absolute values of the results into the elements of the vector of the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 1 1 1 Rn Rd
U ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
bits(esize) absdiff;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or
upper half of the second source SIMD&FP register from the corresponding vector elements of the first source
SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in
this instruction are unsigned integer values.
The UABAL instruction extracts each source vector from the lower half of each source register. The UABAL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 1 0 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) absdiff;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source
SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute
values of the results into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 1 0 1 Rn Rd
U ac
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
bits(esize) absdiff;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the
second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register,
places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The
destination vector elements are twice as long as the source vector elements. All the values in this instruction are
unsigned integer values.
The UABDL instruction extracts each source vector from the lower half of each source register. The UABDL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 1 0 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) absdiff;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the
vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 1 1 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <Ta>
00 0 4H
00 1 8H
01 0 2S
01 1 4S
10 0 1D
10 1 2D
11 x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(2*esize) sum;
integer op1;
integer op2;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source
SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into
a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long
as the source vector elements. All the values in this instruction are unsigned integer values.
The UADDL instruction extracts each source vector from the lower half of each source register. The UADDL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 0 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
Elem[result, e, 2*esize] = sum<2*esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the
source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
The destination vector elements are twice as long as the source vector elements.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 0 1 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <Ta>
00 0 4H
00 1 8H
01 0 2S
01 1 4S
10 0 1D
10 1 2D
11 x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(2*esize) sum;
integer op1;
integer op2;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register
together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as
the source vector elements. All the values in this instruction are unsigned integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 1 0 0 0 0 0 0 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
00 H
01 S
10 D
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
integer sum;
V[d] = sum<2*esize-1:0>;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the
corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in
a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register
and the first source register are twice as long as the vector elements of the second source register. All the values in
this instruction are unsigned integer values.
The UADDW instruction extracts vector elements from the lower half of the second source register. The UADDW2
instruction extracts vector elements from the upper half of the second source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 1 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand1 = V[n];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, 2*esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
Elem[result, e, 2*esize] = sum<2*esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned fixed-point Convert to Floating-point (scalar). This instruction converts the unsigned value in the 32-bit or
64-bit general-purpose source register to a floating-point value using the rounding mode that is specified by the FPCR,
and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 0 0 0 0 1 1 scale Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00' fltsize = 32;
when '01' fltsize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
rounding = FPRoundingMode(FPCR[]);
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<fbits> For the 32-bit to double-precision, 32-bit to half-precision and 32-bit to single-precision variant: is the
number of bits after the binary point in the fixed-point source, in the range 1 to 32, encoded as 64
minus "scale".
For the 64-bit to double-precision, 64-bit to half-precision and 64-bit to single-precision variant: is the
number of bits after the binary point in the fixed-point source, in the range 1 to 64, encoded as 64
minus "scale".
Operation
CheckFPAdvSIMDEnabled64();
intval = X[n];
fltval = if merge then V[d] else Zeros();
Elem[fltval, 0, fltsize] = FixedToFP(intval, fracbits, TRUE, fpcr, rounding);
V[d] = fltval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned integer Convert to Floating-point (scalar). This instruction converts the unsigned integer value in the
general-purpose source register to a floating-point value using the rounding mode that is specified by the FPCR, and
writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 0 1 1 0 0 0 0 0 0 Rn Rd
rmode opcode
integer d = UInt(Rd);
integer n = UInt(Rn);
case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;
rounding = FPRoundingMode(FPCR[]);
<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
intval = X[n];
fltval = if merge then V[d] else Zeros();
Elem[fltval, 0, fltsize] = FixedToFP(intval, 0, TRUE, fpcr, rounding);
V[d] = fltval;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-
point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP
destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 1 1 0 0 1 Rn Rd
U immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 1 1 0 0 1 Rn Rd
U immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 x RESERVED
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width, encoded in
“immh:immb”:
immh <fbits>
000x RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
For the vector variant: is the number of fractional bits, in the range 1 to the element width, encoded in
“immh:immb”:
immh <fbits>
0000 SEE Advanced SIMD modified immediate
0001 RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FixedToFP(element, fracbits, unsigned, fpcr, rounding);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned integer Convert to Floating-point (vector). This instruction converts each element in a vector from an
unsigned integer value to a floating-point value using the rounding mode that is specified by the FPCR, and writes the
result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
sz <V>
0 S
1 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
Q <T>
0 4H
1 8H
For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:
sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Dot Product unsigned arithmetic (vector, by element). This instruction performs the dot product of the four 8-bit
elements in each 32-bit element of the first source register with the four 8-bit elements of an indexed 32-bit element in
the second source register, accumulating the result into the corresponding 32-bit element of the destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.
Note
Vector
(FEAT_DotProd)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 1 1 1 0 H 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(M:Rm);
integer index = UInt(H:L);
Assembler Symbols
<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.
Q <Ta>
0 2S
1 4S
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Q <Tb>
0 8B
1 16B
<Vm> Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.
<index> Is the element index, encoded in the "H:L" fields.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(128) operand2 = V[m];
bits(datasize) result = V[d];
for e = 0 to elements-1
integer res = 0;
integer element1, element2;
for i = 0 to 3
if signed then
element1 = SInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = SInt(Elem[operand2, 4*index+i, esize DIV 4]);
else
element1 = UInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = UInt(Elem[operand2, 4*index+i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = Elem[result, e, esize] + res;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Dot Product unsigned arithmetic (vector). This instruction performs the dot product of the four unsigned 8-bit
elements in each 32-bit element of the first source register with the four unsigned 8-bit elements of the corresponding
32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the
destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.
Note
Vector
(FEAT_DotProd)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 0 0 1 0 1 Rn Rd
U
Assembler Symbols
<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.
Q <Ta>
0 2S
1 4S
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Q <Tb>
0 8B
1 16B
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
result = V[d];
for e = 0 to elements-1
integer res = 0;
integer element1, element2;
for i = 0 to 3
if signed then
element1 = SInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = SInt(Elem[operand2, 4*e+i, esize DIV 4]);
else
element1 = UInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = UInt(Elem[operand2, 4*e+i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = Elem[result, e, esize] + res;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP
registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination
SIMD&FP register.
The results are truncated. For rounded results, see URHADD.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 0 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer sum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
sum = element1 + element2;
Elem[result, e, esize] = sum<esize:1>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register
from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places
each result into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 0 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer diff;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
diff = element1 - element2;
Elem[result, e, esize] = diff<esize:1>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source
SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to
the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 0 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer maxmin;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
maxmin = if minimum then Min(element1, element2) else Max(element1, element2);
Elem[result, e, esize] = maxmin<esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source
SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent
vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a
vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 1 0 0 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
integer element1;
integer element2;
integer maxmin;
for e = 0 to elements-1
element1 = Int(Elem[concat, 2*e, esize], unsigned);
element2 = Int(Elem[concat, (2*e)+1, esize], unsigned);
maxmin = if minimum then Min(element1, element2) else Max(element1, element2);
Elem[result, e, esize] = maxmin<esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register,
and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction
are unsigned integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 1 0 0 0 0 1 0 1 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
00 B
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
integer maxmin;
integer element;
V[d] = maxmin<esize-1:0>;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP
registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the
destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 0 1 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer maxmin;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
maxmin = if minimum then Min(element1, element2) else Max(element1, element2);
Elem[result, e, esize] = maxmin<esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source
SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent
vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into
a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 1 0 1 1 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
integer element1;
integer element2;
integer maxmin;
for e = 0 to elements-1
element1 = Int(Elem[concat, 2*e, esize], unsigned);
element2 = Int(Elem[concat, (2*e)+1, esize], unsigned);
maxmin = if minimum then Min(element1, element2) else Max(element1, element2);
Elem[result, e, esize] = maxmin<esize-1:0>;
V[d] = result;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register,
and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction
are unsigned integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 1 0 0 0 1 1 0 1 0 1 0 Rn Rd
U op
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
00 B
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
integer maxmin;
integer element;
V[d] = maxmin<esize-1:0>;
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Multiply-Add Long (vector, by element). This instruction multiplies each vector element in the lower or
upper half of the first source SIMD&FP register by the specified vector element of the second source SIMD&FP
register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination
vector elements are twice as long as the elements that are multiplied.
The UMLAL instruction extracts vector elements from the lower half of the first source register. The UMLAL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 0 1 0 H 0 Rn Rd
U o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(idxdsize) operand2 = V[m];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the
first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and
accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector
elements are twice as long as the elements that are multiplied.
The UMLAL instruction extracts vector elements from the lower half of the first source register. The UMLAL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 0 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
bits(2*esize) accum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
product = (element1*element2)<2*esize-1:0>;
if sub_op then
accum = Elem[operand3, e, 2*esize] - product;
else
accum = Elem[operand3, e, 2*esize] + product;
Elem[result, e, 2*esize] = accum;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Multiply-Subtract Long (vector, by element). This instruction multiplies each vector element in the lower or
upper half of the first source SIMD&FP register by the specified vector element of the second source SIMD&FP
register and subtracts the results from the vector elements of the destination SIMD&FP register. The destination
vector elements are twice as long as the elements that are multiplied.
The UMLSL instruction extracts vector elements from the lower half of the first source register. The UMLSL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 1 1 0 H 0 Rn Rd
U o2
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(idxdsize) operand2 = V[m];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or
upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the
values in this instruction are unsigned integer values.
The UMLSL instruction extracts each source vector from the lower half of each source register. The UMLSL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 1 0 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
bits(2*esize) accum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
product = (element1*element2)<2*esize-1:0>;
if sub_op then
accum = Elem[operand3, e, 2*esize] - product;
else
accum = Elem[operand3, e, 2*esize] + product;
Elem[result, e, 2*esize] = accum;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned 8-bit integer
values in the first source vector by the 8x2 matrix of unsigned 8-bit integer values in the second source vector. The
resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the
destination vector. This is equivalent to performing an 8-way dot product per destination element.
From Armv8.2 to Armv8.5, this is an OPTIONAL instruction. From Armv8.6 it is mandatory for implementations that
include Advanced SIMD to support it. ID_AA64ISAR1_EL1.I8MM indicates whether this instruction is supported.
Vector
(FEAT_I8MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 1 1 0 1 0 0 Rm 1 0 1 0 0 1 Rn Rd
U B
Assembler Symbols
<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(128) operand1 = V[n];
bits(128) operand2 = V[m];
bits(128) addend = V[d];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Move vector element to general-purpose register. This instruction reads the unsigned integer from the
source SIMD&FP register, zero-extends it to form a 32-bit or 64-bit value, and writes the result to the destination
general-purpose register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias MOV (to general).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 imm5 0 0 1 1 1 1 Rn Rd
32-bit (Q == 0)
integer d = UInt(Rd);
integer n = UInt(Rn);
integer size;
case Q:imm5 of
when '0xxxx1' size = 0; // UMOV Wd, Vn.B
when '0xxx10' size = 1; // UMOV Wd, Vn.H
when '0xx100' size = 2; // UMOV Wd, Vn.S
when '1x1000' size = 3; // UMOV Xd, Vn.D
otherwise UNDEFINED;
Assembler Symbols
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<Ts> For the 32-bit variant: is an element size specifier, encoded in “imm5”:
imm5 <Ts>
xx000 RESERVED
xxxx1 B
xxx10 H
xx100 S
imm5 <Ts>
x0000 RESERVED
xxxx1 RESERVED
xxx10 RESERVED
xx100 RESERVED
x1000 D
<index> For the 32-bit variant: is the element index encoded in “imm5”:
Alias Conditions
Operation
CheckFPAdvSIMDEnabled64();
bits(idxdsize) operand = V[n];
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Multiply Long (vector, by element). This instruction multiplies each vector element in the lower or upper half
of the first source SIMD&FP register by the specified vector element of the second source SIMD&FP register, places
the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are
twice as long as the elements that are multiplied.
The UMULL instruction extracts vector elements from the lower half of the first source register. The UMULL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 1 0 1 0 H 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:
size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED
size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(idxdsize) operand2 = V[m];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half
of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP
register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this
instruction are unsigned integer values.
The UMULL instruction extracts each source vector from the lower half of each source register. The UMULL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 1 0 0 0 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
Elem[result, e, 2*esize] = (element1*element2)<2*esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP
registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 0 0 0 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 0 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer sum;
boolean sat;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
sum = element1 + element2;
(Elem[result, e, esize], sat) = SatQ(sum, esize, unsigned);
if sat then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source
SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector
element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the
destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded. For
truncated results, see UQSHL.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 1 0 1 1 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
if S == '0' && size != '11' then UNDEFINED;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 1 1 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
Assembler Symbols
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = 0;
integer shift;
integer element;
boolean sat;
for e = 0 to elements-1
shift = SInt(Elem[operand2, e, esize]<7:0>);
if rounding then
round_const = 1 << (-shift - 1); // 0 for left shift, 2^(n-1) for right shift
element = (Int(Elem[operand1, e, esize], unsigned) + round_const) << shift;
if saturating then
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';
else
Elem[result, e, esize] = element<esize-1:0>;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating Rounded Shift Right Narrow (immediate). This instruction reads each vector element in the
source SIMD&FP register, right shifts each result by an immediate value, puts the final result into a vector, and writes
the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are
unsigned integer values. The results are rounded. For truncated results, see UQSHRN.
The UQRSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the UQRSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 0 0 1 1 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 0 1 1 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to the destination operand width in bits,
encoded in “immh:immb”:
immh <shift>
0000 RESERVED
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
boolean sat;
for e = 0 to elements-1
element = (Int(Elem[operand, e, 2*esize], unsigned) + round_const) >> shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating Shift Left (immediate). This instruction takes each vector element in the source SIMD&FP
register, shifts it by an immediate value, places the results in a vector, and writes the vector to the destination
SIMD&FP register. The results are truncated. For rounded results, see UQRSHL.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 1 1 1 0 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
boolean src_unsigned;
boolean dst_unsigned;
case op:U of
when '00' UNDEFINED;
when '01' src_unsigned = FALSE; dst_unsigned = TRUE;
when '10' src_unsigned = FALSE; dst_unsigned = FALSE;
when '11' src_unsigned = TRUE; dst_unsigned = TRUE;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 1 1 1 0 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
boolean src_unsigned;
boolean dst_unsigned;
case op:U of
when '00' UNDEFINED;
when '01' src_unsigned = FALSE; dst_unsigned = TRUE;
when '10' src_unsigned = FALSE; dst_unsigned = FALSE;
when '11' src_unsigned = TRUE; dst_unsigned = TRUE;
immh <V>
0000 RESERVED
0001 B
001x H
01xx S
1xxx D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the left shift amount, in the range 0 to the operand width in bits minus 1,
encoded in “immh:immb”:
immh <shift>
0000 RESERVED
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)
For the vector variant: is the left shift amount, in the range 0 to the element width in bits minus 1,
encoded in “immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean sat;
for e = 0 to elements-1
element = Int(Elem[operand, e, esize], src_unsigned) << shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, dst_unsigned);
if sat then FPSR.QC = '1';
V[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source
SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the
second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP
register.
If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. For
rounded results, see UQRSHL.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 1 0 0 1 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
if S == '0' && size != '11' then UNDEFINED;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 0 1 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
Assembler Symbols
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = 0;
integer shift;
integer element;
boolean sat;
for e = 0 to elements-1
shift = SInt(Elem[operand2, e, esize]<7:0>);
if rounding then
round_const = 1 << (-shift - 1); // 0 for left shift, 2^(n-1) for right shift
element = (Int(Elem[operand1, e, esize], unsigned) + round_const) << shift;
if saturating then
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';
else
Elem[result, e, esize] = element<esize-1:0>;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating Shift Right Narrow (immediate). This instruction reads each vector element in the source
SIMD&FP register, right shifts each result by an immediate value, saturates each shifted result to a value that is half
the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the destination
SIMD&FP register. All the values in this instruction are unsigned integer values. The results are truncated. For
rounded results, see UQRSHRN.
The UQSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the UQSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 0 0 1 0 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 0 1 0 1 Rn Rd
U immh op
integer d = UInt(Rd);
integer n = UInt(Rn);
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to the destination operand width in bits,
encoded in “immh:immb”:
immh <shift>
0000 RESERVED
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
boolean sat;
for e = 0 to elements-1
element = (Int(Elem[operand, e, 2*esize], unsigned) + round_const) >> shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register
from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and
writes the vector to the destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 0 1 0 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 0 1 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer diff;
boolean sat;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
diff = element1 - element2;
(Elem[result, e, esize], sat) = SatQ(diff, esize, unsigned);
if sat then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register,
saturates each value to half the original width, places the result into a vector, and writes the vector to the destination
SIMD&FP register. All the values in this instruction are unsigned integer values.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
The UQXTN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
UQXTN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 1 0 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 1 0 1 0 0 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
size <Vb>
00 B
01 H
10 S
11 RESERVED
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
size <Va>
00 H
01 S
10 D
11 RESERVED
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand = V[n];
bits(datasize) result;
bits(2*esize) element;
boolean sat;
for e = 0 to elements-1
element = Elem[operand, e, 2*esize];
(Elem[result, e, esize], sat) = SatQ(Int(element, unsigned), esize, unsigned);
if sat then FPSR.QC = '1';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register,
calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector
to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
sz Q <T>
0 0 2S
0 1 4S
1 x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(32) element;
for e = 0 to elements-1
element = Elem[operand, e, 32];
Elem[result, e, 32] = UnsignedRecipEstimate(element);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source
SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the
destination SIMD&FP register.
The results are rounded. For truncated results, see UHADD.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 1 0 1 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
Elem[result, e, esize] = (element1+element2+1)<esize:1>;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP
register, shifts the vector element by a value from the least significant byte of the corresponding element of the second
source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 1 0 1 0 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
if S == '0' && size != '11' then UNDEFINED;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 1 0 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = 0;
integer shift;
integer element;
boolean sat;
for e = 0 to elements-1
shift = SInt(Elem[operand2, e, esize]<7:0>);
if rounding then
round_const = 1 << (-shift - 1); // 0 for left shift, 2^(n-1) for right shift
element = (Int(Elem[operand1, e, esize], unsigned) + round_const) << shift;
if saturating then
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';
else
Elem[result, e, esize] = element<esize-1:0>;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Rounding Shift Right (immediate). This instruction reads each vector element in the source SIMD&FP
register, right shifts each result by an immediate value, writes the final result to a vector, and writes the vector to the
destination SIMD&FP register. All the values in this instruction are unsigned integer values. The results are rounded.
For truncated results, see USHR.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 0 1 0 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 0 1 0 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))
For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP
register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the
vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
sz Q <T>
0 0 2S
0 1 4S
1 x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(32) element;
for e = 0 to elements-1
element = Elem[operand, e, 32];
Elem[result, e, 32] = UnsignedRSqrtEstimate(element);
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Rounding Shift Right and Accumulate (immediate). This instruction reads each vector element in the source
SIMD&FP register, right shifts each result by an immediate value, and accumulates the final results with the vector
elements of the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The
results are rounded. For truncated results, see USRA.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 0 1 1 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 0 1 1 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))
For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Dot Product index form with unsigned and signed integers. This instruction performs the dot product of the four
unsigned 8-bit integer values in each 32-bit element of the first source register with the four signed 8-bit integer
values in an indexed 32-bit element of the second source register, accumulating the result into the corresponding
32-bit element of the destination register.
From Armv8.2 to Armv8.5, this is an OPTIONAL instruction. From Armv8.6 it is mandatory for implementations that
include Advanced SIMD to support it. ID_AA64ISAR1_EL1.I8MM indicates whether this instruction is supported.
Vector
(FEAT_I8MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 0 L M Rm 1 1 1 1 H 0 Rn Rd
US
Assembler Symbols
<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.
Q <Ta>
0 2S
1 4S
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Q <Tb>
0 8B
1 16B
<Vm> Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.
<index> Is the immediate index of a quadtuplet of four 8-bit elements in the range 0 to 3, encoded in the "H:L"
fields.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(128) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
for e = 0 to elements-1
bits(32) res = Elem[operand3, e, 32];
for b = 0 to 3
integer element1 = Int(Elem[operand1, 4*e+b, 8], op1_unsigned);
integer element2 = Int(Elem[operand2, 4*i+b, 8], op2_unsigned);
res = res + element1 * element2;
Elem[result, e, 32] = res;
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Dot Product vector form with unsigned and signed integers. This instruction performs the dot product of the four
unsigned 8-bit integer values in each 32-bit element of the first source register with the four signed 8-bit integer
values in the corresponding 32-bit element of the second source register, accumulating the result into the
corresponding 32-bit element of the destination register.
From Armv8.2 to Armv8.5, this is an OPTIONAL instruction. From Armv8.6 it is mandatory for implementations that
include Advanced SIMD to support it. ID_AA64ISAR1_EL1.I8MM indicates whether this instruction is supported.
Vector
(FEAT_I8MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 0 Rm 1 0 0 1 1 1 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.
Q <Ta>
0 2S
1 4S
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
Q <Tb>
0 8B
1 16B
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
for e = 0 to elements-1
bits(32) res = Elem[operand3, e, 32];
for b = 0 to 3
integer element1 = UInt(Elem[operand1, 4*e+b, 8]);
integer element2 = SInt(Elem[operand2, 4*e+b, 8]);
res = res + element1 * element2;
Elem[result, e, 32] = res;
V[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register,
shifts each element by a value from the least significant byte of the corresponding element of the second source
SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift. For a
rounding shift, see URSHL.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 1 0 0 0 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
if S == '0' && size != '11' then UNDEFINED;
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 0 0 1 Rn Rd
U R S
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
Assembler Symbols
size <V>
0x RESERVED
10 RESERVED
11 D
<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = 0;
integer shift;
integer element;
boolean sat;
for e = 0 to elements-1
shift = SInt(Elem[operand2, e, esize]<7:0>);
if rounding then
round_const = 1 << (-shift - 1); // 0 for left shift, 2^(n-1) for right shift
element = (Int(Elem[operand1, e, esize], unsigned) + round_const) << shift;
if saturating then
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';
else
Elem[result, e, esize] = element<esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Shift Left Long (immediate). This instruction reads each vector element in the lower or upper half of the
source SIMD&FP register, shifts the unsigned integer value left by the specified number of bits, places the result into
a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long
as the source vector elements.
The USHLL instruction extracts vector elements from the lower half of the source register. The USHLL2 instruction
extracts vector elements from the upper half of the source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias UXTL, UXTL2.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 1 0 0 1 Rn Rd
U immh
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> Is the left shift amount, in the range 0 to the source element width in bits minus 1, encoded in
“immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx RESERVED
Alias Conditions
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = Vpart[n, part];
bits(datasize*2) result;
integer element;
for e = 0 to elements-1
element = Int(Elem[operand, e, esize], unsigned) << shift;
Elem[result, e, 2*esize] = element<2*esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Shift Right (immediate). This instruction reads each vector element in the source SIMD&FP register, right
shifts each result by an immediate value, writes the final result to a vector, and writes the vector to the destination
SIMD&FP register. All the values in this instruction are unsigned integer values. The results are truncated. For
rounded results, see URSHR.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 0 0 0 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 0 0 0 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))
For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned and signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned
8-bit integer values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source
vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator
in the destination vector. This is equivalent to performing an 8-way dot product per destination element.
From Armv8.2 to Armv8.5, this is an OPTIONAL instruction. From Armv8.6 it is mandatory for implementations that
include Advanced SIMD to support it. ID_AA64ISAR1_EL1.I8MM indicates whether this instruction is supported.
Vector
(FEAT_I8MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 1 0 0 Rm 1 0 1 0 1 1 Rn Rd
U B
Assembler Symbols
<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(128) operand1 = V[n];
bits(128) operand2 = V[m];
bits(128) addend = V[d];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector
elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the
destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the
destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Rd
U
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
for e = 0 to elements-1
op1 = Int(Elem[operand, e, esize], !unsigned);
op2 = Int(Elem[operand2, e, esize], unsigned);
(Elem[result, e, esize], sat) = SatQ(op1 + op2, esize, unsigned);
if sat then FPSR.QC = '1';
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD&FP
register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of
the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The results are
truncated. For rounded results, see URSRA.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector
Scalar
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 0 0 1 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Vector
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 0 0 1 0 1 Rn Rd
U immh o1 o0
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
immh <V>
0xxx RESERVED
1xxx D
immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:
immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))
For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:
immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))
Operation
CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second
source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the
result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are
unsigned integer values. The destination vector elements are twice as long as the source vector elements.
The USUBL instruction extracts each source vector from the lower half of each source register. The USUBL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 0 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
Elem[result, e, 2*esize] = sum<2*esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from
the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in
a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are unsigned
integer values.
The vector elements of the destination register and the first source register are twice as long as the vector elements of
the second source register.
The USUBW instruction extracts vector elements from the lower half of the first source register. The USUBW2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 1 0 0 Rn Rd
U o1
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
Operation
CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand1 = V[n];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;
for e = 0 to elements-1
element1 = Int(Elem[operand1, e, 2*esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
Elem[result, e, 2*esize] = sum<2*esize-1:0>;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned extend Long. This instruction copies each vector element from the lower or upper half of the source
SIMD&FP register into a vector, and writes the vector to the destination SIMD&FP register. The destination vector
elements are twice as long as the source vector elements.
The UXTL instruction extracts vector elements from the lower half of the source register. The UXTL2 instruction
extracts vector elements from the upper half of the source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
• The encodings in this description are named to match the encodings of USHLL, USHLL2.
• The description of USHLL, USHLL2 gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 0 0 0 1 0 1 0 0 1 Rn Rd
U immh immb
is equivalent to
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED
The description of USHLL, USHLL2 gives the operational pseudocode for this instruction.
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source
SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the
lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a
vector, and writes the vector to the destination SIMD&FP register.
Note
Vd B6 B4 B2 B0 A6 A4 A2 A0 Vd B7 B5 B3 B1 A7 A5 A3 A1
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 0 0 0 1 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operandl = V[n];
bits(datasize) operandh = V[m];
bits(datasize) result;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source
SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a
vector, and the result from the second source register into consecutive elements in the upper half of a vector, and
writes the vector to the destination SIMD&FP register.
Note
Vd B6 B4 B2 B0 A6 A4 A2 A0 Vd B7 B5 B3 B1 A7 A5 A3 A1
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 0 1 0 1 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operandl = V[n];
bits(datasize) operandh = V[m];
bits(datasize) result;
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Exclusive OR and Rotate performs a bitwise exclusive OR of the 128-bit vectors in the two source SIMD&FP registers,
rotates each 64-bit element of the resulting 128-bit vector right by the value specified by a 6-bit immediate value, and
writes the result to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA3 is implemented.
Advanced SIMD
(FEAT_SHA3)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 1 0 0 Rm imm6 Rn Rd
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
<imm6> Is a rotation right, encoded in "imm6".
Operation
AArch64.CheckFPAdvSIMDEnabled();
bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) tmp;
tmp = Vn EOR Vm;
V[d] = ROR(tmp<127:64>, UInt(imm6)):ROR(tmp<63:0>, UInt(imm6));
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to
half the original width, places the result into a vector, and writes the vector to the lower or upper half of the
destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
The XTN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
XTN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 1 0 0 1 0 1 0 Rn Rd
integer d = UInt(Rd);
integer n = UInt(Rn);
Assembler Symbols
2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:
Q 2
0 [absent]
1 [present]
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
size <Ta>
00 8H
01 4S
10 2D
11 RESERVED
CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand = V[n];
bits(datasize) result;
bits(2*esize) element;
for e = 0 to elements-1
element = Elem[operand, e, 2*esize];
Elem[result, e, esize] = element<esize-1:0>;
Vpart[d, part] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP
registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination
SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with
subsequent pairs taken alternately from each source register.
Note
Vd B3 A3 B2 A2 B1 A1 B0 A0 Vd B7 A7 B6 A6 B5 A5 B4 A4
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 0 0 1 1 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
for p = 0 to pairs-1
Elem[result, 2*p+0, esize] = Elem[operand1, base+p, esize];
Elem[result, 2*p+1, esize] = Elem[operand2, base+p, esize];
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP
registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination
SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with
subsequent pairs taken alternately from each source register.
Note
Vd B3 A3 B2 A2 B1 A1 B0 A0 Vd B7 A7 B6 A6 B5 A5 B4 A4
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 0 1 1 1 1 0 Rn Rd
op
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
Assembler Symbols
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
for p = 0 to pairs-1
Elem[result, 2*p+0, esize] = Elem[operand1, base+p, esize];
Elem[result, 2*p+1, esize] = Elem[operand2, base+p, esize];
V[d] = result;
Operational information
If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
ASR (wide elements, predicated): Arithmetic shift right by 64-bit wide elements (predicated).
ASR (wide elements, unpredicated): Arithmetic shift right by 64-bit wide elements (unpredicated).
BIC (immediate): Bitwise clear bits using immediate (unpredicated): an alias of AND (immediate).
Page 1681
A64 -- SVE Instructions (alphabetic order)
BRKAS: Break after first true condition, setting the condition flags.
BRKBS: Break before first true condition, setting the condition flags.
BRKPA: Break after first true condition, propagating from previous partition.
BRKPAS: Break after first true condition, propagating from previous partition and setting the condition flags.
BRKPB: Break before first true condition, propagating from previous partition.
BRKPBS: Break before first true condition, propagating from previous partition and setting the condition flags.
CLASTA (SIMD&FP scalar): Conditionally extract element after last to SIMD&FP scalar register.
CLASTB (SIMD&FP scalar): Conditionally extract last element to SIMD&FP scalar register.
CMPLE (vectors): Compare signed less than or equal to vector, setting the condition flags: an alias of CMP<cc>
(vectors).
CMPLO (vectors): Compare unsigned lower than vector, setting the condition flags: an alias of CMP<cc> (vectors).
CMPLS (vectors): Compare unsigned lower or same as vector, setting the condition flags: an alias of CMP<cc>
(vectors).
CMPLT (vectors): Compare signed less than vector, setting the condition flags: an alias of CMP<cc> (vectors).
CNTB, CNTD, CNTH, CNTW: Set scalar to multiple of predicate constraint element count.
COMPACT: Shuffle active elements of vector to the right and fill with zero.
CPY (immediate, merging): Copy signed integer immediate to vector elements (merging).
CPY (immediate, zeroing): Copy signed integer immediate to vector elements (zeroing).
CPY (SIMD&FP scalar): Copy SIMD&FP scalar register to vector elements (predicated).
Page 1682
A64 -- SVE Instructions (alphabetic order)
DECB, DECD, DECH, DECW (scalar): Decrement scalar by multiple of predicate constraint element count.
DECD, DECH, DECW (vector): Decrement vector by multiple of predicate constraint element count.
EON: Bitwise exclusive OR with inverted immediate (unpredicated): an alias of EOR (immediate).
FCMLE (vectors): Floating-point compare less than or equal to vector: an alias of FCM<cc> (vectors).
FCMLT (vectors): Floating-point compare less than vector: an alias of FCM<cc> (vectors).
Page 1683
A64 -- SVE Instructions (alphabetic order)
FMAD: Floating-point fused multiply-add vectors (predicated), writing multiplicand [Zdn = Za + Zdn * Zm].
FMLA (indexed): Floating-point fused multiply-add by indexed elements (Zda = Zda + Zn * Zm[indexed]).
FMLA (vectors): Floating-point fused multiply-add vectors (predicated), writing addend [Zda = Zda + Zn * Zm].
FMLS (indexed): Floating-point fused multiply-subtract by indexed elements (Zda = Zda + -Zn * Zm[indexed]).
FMLS (vectors): Floating-point fused multiply-subtract vectors (predicated), writing addend [Zda = Zda + -Zn * Zm].
FMOV (immediate, predicated): Move 8-bit floating-point immediate to vector elements (predicated): an alias of FCPY.
FMOV (immediate, unpredicated): Move 8-bit floating-point immediate to vector elements (unpredicated): an alias of
FDUP.
FMOV (zero, predicated): Move floating-point +0.0 to vector elements (predicated): an alias of CPY (immediate,
merging).
FMOV (zero, unpredicated): Move floating-point +0.0 to vector elements (unpredicated): an alias of DUP (immediate).
FMSB: Floating-point fused multiply-subtract vectors (predicated), writing multiplicand [Zdn = Za + -Zdn * Zm].
FNMAD: Floating-point negated fused multiply-add vectors (predicated), writing multiplicand [Zdn = -Za + -Zdn *
Zm].
Page 1684
A64 -- SVE Instructions (alphabetic order)
FNMLA: Floating-point negated fused multiply-add vectors (predicated), writing addend [Zda = -Zda + -Zn * Zm].
FNMLS: Floating-point negated fused multiply-subtract vectors (predicated), writing addend [Zda = -Zda + Zn * Zm].
FNMSB: Floating-point negated fused multiply-subtract vectors (predicated), writing multiplicand [Zdn = -Za + Zdn *
Zm].
INCB, INCD, INCH, INCW (scalar): Increment scalar by multiple of predicate constraint element count.
INCD, INCH, INCW (vector): Increment vector by multiple of predicate constraint element count.
INDEX (immediate, scalar): Create index starting from immediate and incremented by general-purpose register.
INDEX (scalar, immediate): Create index starting from general-purpose register and incremented by immediate.
INDEX (scalars): Create index starting from and incremented by general-purpose register.
LASTA (SIMD&FP scalar): Extract element after last to SIMD&FP scalar register.
LD1B (scalar plus immediate): Contiguous load unsigned bytes to vector (immediate index).
LD1B (scalar plus scalar): Contiguous load unsigned bytes to vector (scalar index).
LD1B (scalar plus vector): Gather load unsigned bytes to vector (vector index).
Page 1685
A64 -- SVE Instructions (alphabetic order)
LD1B (vector plus immediate): Gather load unsigned bytes to vector (immediate index).
LD1D (scalar plus immediate): Contiguous load doublewords to vector (immediate index).
LD1D (scalar plus scalar): Contiguous load doublewords to vector (scalar index).
LD1D (scalar plus vector): Gather load doublewords to vector (vector index).
LD1D (vector plus immediate): Gather load doublewords to vector (immediate index).
LD1H (scalar plus immediate): Contiguous load unsigned halfwords to vector (immediate index).
LD1H (scalar plus scalar): Contiguous load unsigned halfwords to vector (scalar index).
LD1H (scalar plus vector): Gather load unsigned halfwords to vector (vector index).
LD1H (vector plus immediate): Gather load unsigned halfwords to vector (immediate index).
LD1ROB (scalar plus immediate): Contiguous load and replicate thirty-two bytes (immediate index).
LD1ROB (scalar plus scalar): Contiguous load and replicate thirty-two bytes (scalar index).
LD1ROD (scalar plus immediate): Contiguous load and replicate four doublewords (immediate index).
LD1ROD (scalar plus scalar): Contiguous load and replicate four doublewords (scalar index).
LD1ROH (scalar plus immediate): Contiguous load and replicate sixteen halfwords (immediate index).
LD1ROH (scalar plus scalar): Contiguous load and replicate sixteen halfwords (scalar index).
LD1ROW (scalar plus immediate): Contiguous load and replicate eight words (immediate index).
LD1ROW (scalar plus scalar): Contiguous load and replicate eight words (scalar index).
LD1RQB (scalar plus immediate): Contiguous load and replicate sixteen bytes (immediate index).
LD1RQB (scalar plus scalar): Contiguous load and replicate sixteen bytes (scalar index).
LD1RQD (scalar plus immediate): Contiguous load and replicate two doublewords (immediate index).
LD1RQD (scalar plus scalar): Contiguous load and replicate two doublewords (scalar index).
LD1RQH (scalar plus immediate): Contiguous load and replicate eight halfwords (immediate index).
LD1RQH (scalar plus scalar): Contiguous load and replicate eight halfwords (scalar index).
LD1RQW (scalar plus immediate): Contiguous load and replicate four words (immediate index).
LD1RQW (scalar plus scalar): Contiguous load and replicate four words (scalar index).
LD1SB (scalar plus immediate): Contiguous load signed bytes to vector (immediate index).
LD1SB (scalar plus scalar): Contiguous load signed bytes to vector (scalar index).
LD1SB (scalar plus vector): Gather load signed bytes to vector (vector index).
LD1SB (vector plus immediate): Gather load signed bytes to vector (immediate index).
Page 1686
A64 -- SVE Instructions (alphabetic order)
LD1SH (scalar plus immediate): Contiguous load signed halfwords to vector (immediate index).
LD1SH (scalar plus scalar): Contiguous load signed halfwords to vector (scalar index).
LD1SH (scalar plus vector): Gather load signed halfwords to vector (vector index).
LD1SH (vector plus immediate): Gather load signed halfwords to vector (immediate index).
LD1SW (scalar plus immediate): Contiguous load signed words to vector (immediate index).
LD1SW (scalar plus scalar): Contiguous load signed words to vector (scalar index).
LD1SW (scalar plus vector): Gather load signed words to vector (vector index).
LD1SW (vector plus immediate): Gather load signed words to vector (immediate index).
LD1W (scalar plus immediate): Contiguous load unsigned words to vector (immediate index).
LD1W (scalar plus scalar): Contiguous load unsigned words to vector (scalar index).
LD1W (scalar plus vector): Gather load unsigned words to vector (vector index).
LD1W (vector plus immediate): Gather load unsigned words to vector (immediate index).
LD2B (scalar plus immediate): Contiguous load two-byte structures to two vectors (immediate index).
LD2B (scalar plus scalar): Contiguous load two-byte structures to two vectors (scalar index).
LD2D (scalar plus immediate): Contiguous load two-doubleword structures to two vectors (immediate index).
LD2D (scalar plus scalar): Contiguous load two-doubleword structures to two vectors (scalar index).
LD2H (scalar plus immediate): Contiguous load two-halfword structures to two vectors (immediate index).
LD2H (scalar plus scalar): Contiguous load two-halfword structures to two vectors (scalar index).
LD2W (scalar plus immediate): Contiguous load two-word structures to two vectors (immediate index).
LD2W (scalar plus scalar): Contiguous load two-word structures to two vectors (scalar index).
LD3B (scalar plus immediate): Contiguous load three-byte structures to three vectors (immediate index).
LD3B (scalar plus scalar): Contiguous load three-byte structures to three vectors (scalar index).
LD3D (scalar plus immediate): Contiguous load three-doubleword structures to three vectors (immediate index).
LD3D (scalar plus scalar): Contiguous load three-doubleword structures to three vectors (scalar index).
LD3H (scalar plus immediate): Contiguous load three-halfword structures to three vectors (immediate index).
LD3H (scalar plus scalar): Contiguous load three-halfword structures to three vectors (scalar index).
LD3W (scalar plus immediate): Contiguous load three-word structures to three vectors (immediate index).
LD3W (scalar plus scalar): Contiguous load three-word structures to three vectors (scalar index).
LD4B (scalar plus immediate): Contiguous load four-byte structures to four vectors (immediate index).
LD4B (scalar plus scalar): Contiguous load four-byte structures to four vectors (scalar index).
LD4D (scalar plus immediate): Contiguous load four-doubleword structures to four vectors (immediate index).
LD4D (scalar plus scalar): Contiguous load four-doubleword structures to four vectors (scalar index).
LD4H (scalar plus immediate): Contiguous load four-halfword structures to four vectors (immediate index).
LD4H (scalar plus scalar): Contiguous load four-halfword structures to four vectors (scalar index).
LD4W (scalar plus immediate): Contiguous load four-word structures to four vectors (immediate index).
LD4W (scalar plus scalar): Contiguous load four-word structures to four vectors (scalar index).
Page 1687
A64 -- SVE Instructions (alphabetic order)
LDFF1B (scalar plus scalar): Contiguous load first-fault unsigned bytes to vector (scalar index).
LDFF1B (scalar plus vector): Gather load first-fault unsigned bytes to vector (vector index).
LDFF1B (vector plus immediate): Gather load first-fault unsigned bytes to vector (immediate index).
LDFF1D (scalar plus scalar): Contiguous load first-fault doublewords to vector (scalar index).
LDFF1D (scalar plus vector): Gather load first-fault doublewords to vector (vector index).
LDFF1D (vector plus immediate): Gather load first-fault doublewords to vector (immediate index).
LDFF1H (scalar plus scalar): Contiguous load first-fault unsigned halfwords to vector (scalar index).
LDFF1H (scalar plus vector): Gather load first-fault unsigned halfwords to vector (vector index).
LDFF1H (vector plus immediate): Gather load first-fault unsigned halfwords to vector (immediate index).
LDFF1SB (scalar plus scalar): Contiguous load first-fault signed bytes to vector (scalar index).
LDFF1SB (scalar plus vector): Gather load first-fault signed bytes to vector (vector index).
LDFF1SB (vector plus immediate): Gather load first-fault signed bytes to vector (immediate index).
LDFF1SH (scalar plus scalar): Contiguous load first-fault signed halfwords to vector (scalar index).
LDFF1SH (scalar plus vector): Gather load first-fault signed halfwords to vector (vector index).
LDFF1SH (vector plus immediate): Gather load first-fault signed halfwords to vector (immediate index).
LDFF1SW (scalar plus scalar): Contiguous load first-fault signed words to vector (scalar index).
LDFF1SW (scalar plus vector): Gather load first-fault signed words to vector (vector index).
LDFF1SW (vector plus immediate): Gather load first-fault signed words to vector (immediate index).
LDFF1W (scalar plus scalar): Contiguous load first-fault unsigned words to vector (scalar index).
LDFF1W (scalar plus vector): Gather load first-fault unsigned words to vector (vector index).
LDFF1W (vector plus immediate): Gather load first-fault unsigned words to vector (immediate index).
LDNT1B (scalar plus immediate): Contiguous load non-temporal bytes to vector (immediate index).
LDNT1B (scalar plus scalar): Contiguous load non-temporal bytes to vector (scalar index).
LDNT1D (scalar plus immediate): Contiguous load non-temporal doublewords to vector (immediate index).
LDNT1D (scalar plus scalar): Contiguous load non-temporal doublewords to vector (scalar index).
LDNT1H (scalar plus immediate): Contiguous load non-temporal halfwords to vector (immediate index).
LDNT1H (scalar plus scalar): Contiguous load non-temporal halfwords to vector (scalar index).
LDNT1W (scalar plus immediate): Contiguous load non-temporal words to vector (immediate index).
LDNT1W (scalar plus scalar): Contiguous load non-temporal words to vector (scalar index).
Page 1688
A64 -- SVE Instructions (alphabetic order)
LSL (wide elements, predicated): Logical shift left by 64-bit wide elements (predicated).
LSL (wide elements, unpredicated): Logical shift left by 64-bit wide elements (unpredicated).
LSR (wide elements, predicated): Logical shift right by 64-bit wide elements (predicated).
LSR (wide elements, unpredicated): Logical shift right by 64-bit wide elements (unpredicated).
MOV (immediate, predicated, merging): Move signed integer immediate to vector elements (merging): an alias of CPY
(immediate, merging).
MOV (immediate, predicated, zeroing): Move signed integer immediate to vector elements (zeroing): an alias of CPY
(immediate, zeroing).
MOV (immediate, unpredicated): Move signed immediate to vector elements (unpredicated): an alias of DUP
(immediate).
MOV (predicate, predicated, merging): Move predicates (merging): an alias of SEL (predicates).
MOV (predicate, predicated, zeroing): Move predicates (zeroing): an alias of AND (predicates).
MOV (scalar, predicated): Move general-purpose register to vector elements (predicated): an alias of CPY (scalar).
MOV (scalar, unpredicated): Move general-purpose register to vector elements (unpredicated): an alias of DUP
(scalar).
MOV (SIMD&FP scalar, predicated): Move SIMD&FP scalar register to vector elements (predicated): an alias of CPY
(SIMD&FP scalar).
MOV (SIMD&FP scalar, unpredicated): Move indexed element or SIMD&FP scalar to vector (unpredicated): an alias of
DUP (indexed).
MOV (vector, predicated): Move vector elements (predicated): an alias of SEL (vectors).
MOV (vector, unpredicated): Move vector register (unpredicated): an alias of ORR (vectors, unpredicated).
MOVS (predicated): Move predicates (zeroing), setting the condition flags: an alias of ANDS.
Page 1689
A64 -- SVE Instructions (alphabetic order)
MOVS (unpredicated): Move predicate (unpredicated), setting the condition flags: an alias of ORRS.
NOTS: Bitwise invert predicate, setting the condition flags: an alias of EORS.
ORN (immediate): Bitwise inclusive OR with inverted immediate (unpredicated): an alias of ORR (immediate).
PRFB (scalar plus vector): Gather prefetch bytes (scalar plus vector).
PRFB (vector plus immediate): Gather prefetch bytes (vector plus immediate).
PRFD (scalar plus vector): Gather prefetch doublewords (scalar plus vector).
PRFD (vector plus immediate): Gather prefetch doublewords (vector plus immediate).
PRFH (scalar plus vector): Gather prefetch halfwords (scalar plus vector).
PRFH (vector plus immediate): Gather prefetch halfwords (vector plus immediate).
Page 1690
A64 -- SVE Instructions (alphabetic order)
PRFW (scalar plus vector): Gather prefetch words (scalar plus vector).
PRFW (vector plus immediate): Gather prefetch words (vector plus immediate).
PTRUES: Initialise predicate from named constraint and set the condition flags.
RDFFRS: Return predicate of succesfully loaded elements, setting the condition flags.
REVB, REVH, REVW: Reverse bytes / halfwords / words within elements (predicated).
Page 1691
A64 -- SVE Instructions (alphabetic order)
SQDECB: Signed saturating decrement scalar by multiple of 8-bit predicate constraint element count.
SQDECD (scalar): Signed saturating decrement scalar by multiple of 64-bit predicate constraint element count.
SQDECD (vector): Signed saturating decrement vector by multiple of 64-bit predicate constraint element count.
SQDECH (scalar): Signed saturating decrement scalar by multiple of 16-bit predicate constraint element count.
SQDECH (vector): Signed saturating decrement vector by multiple of 16-bit predicate constraint element count.
SQDECP (scalar): Signed saturating decrement scalar by count of true predicate elements.
SQDECP (vector): Signed saturating decrement vector by count of true predicate elements.
SQDECW (scalar): Signed saturating decrement scalar by multiple of 32-bit predicate constraint element count.
SQDECW (vector): Signed saturating decrement vector by multiple of 32-bit predicate constraint element count.
SQINCB: Signed saturating increment scalar by multiple of 8-bit predicate constraint element count.
SQINCD (scalar): Signed saturating increment scalar by multiple of 64-bit predicate constraint element count.
SQINCD (vector): Signed saturating increment vector by multiple of 64-bit predicate constraint element count.
SQINCH (scalar): Signed saturating increment scalar by multiple of 16-bit predicate constraint element count.
SQINCH (vector): Signed saturating increment vector by multiple of 16-bit predicate constraint element count.
SQINCP (scalar): Signed saturating increment scalar by count of true predicate elements.
SQINCP (vector): Signed saturating increment vector by count of true predicate elements.
SQINCW (scalar): Signed saturating increment scalar by multiple of 32-bit predicate constraint element count.
SQINCW (vector): Signed saturating increment vector by multiple of 32-bit predicate constraint element count.
ST1B (scalar plus immediate): Contiguous store bytes from vector (immediate index).
ST1B (scalar plus scalar): Contiguous store bytes from vector (scalar index).
ST1B (scalar plus vector): Scatter store bytes from a vector (vector index).
ST1B (vector plus immediate): Scatter store bytes from a vector (immediate index).
ST1D (scalar plus immediate): Contiguous store doublewords from vector (immediate index).
ST1D (scalar plus scalar): Contiguous store doublewords from vector (scalar index).
ST1D (scalar plus vector): Scatter store doublewords from a vector (vector index).
ST1D (vector plus immediate): Scatter store doublewords from a vector (immediate index).
ST1H (scalar plus immediate): Contiguous store halfwords from vector (immediate index).
ST1H (scalar plus scalar): Contiguous store halfwords from vector (scalar index).
ST1H (scalar plus vector): Scatter store halfwords from a vector (vector index).
ST1H (vector plus immediate): Scatter store halfwords from a vector (immediate index).
ST1W (scalar plus immediate): Contiguous store words from vector (immediate index).
ST1W (scalar plus scalar): Contiguous store words from vector (scalar index).
ST1W (scalar plus vector): Scatter store words from a vector (vector index).
Page 1692
A64 -- SVE Instructions (alphabetic order)
ST1W (vector plus immediate): Scatter store words from a vector (immediate index).
ST2B (scalar plus immediate): Contiguous store two-byte structures from two vectors (immediate index).
ST2B (scalar plus scalar): Contiguous store two-byte structures from two vectors (scalar index).
ST2D (scalar plus immediate): Contiguous store two-doubleword structures from two vectors (immediate index).
ST2D (scalar plus scalar): Contiguous store two-doubleword structures from two vectors (scalar index).
ST2H (scalar plus immediate): Contiguous store two-halfword structures from two vectors (immediate index).
ST2H (scalar plus scalar): Contiguous store two-halfword structures from two vectors (scalar index).
ST2W (scalar plus immediate): Contiguous store two-word structures from two vectors (immediate index).
ST2W (scalar plus scalar): Contiguous store two-word structures from two vectors (scalar index).
ST3B (scalar plus immediate): Contiguous store three-byte structures from three vectors (immediate index).
ST3B (scalar plus scalar): Contiguous store three-byte structures from three vectors (scalar index).
ST3D (scalar plus immediate): Contiguous store three-doubleword structures from three vectors (immediate index).
ST3D (scalar plus scalar): Contiguous store three-doubleword structures from three vectors (scalar index).
ST3H (scalar plus immediate): Contiguous store three-halfword structures from three vectors (immediate index).
ST3H (scalar plus scalar): Contiguous store three-halfword structures from three vectors (scalar index).
ST3W (scalar plus immediate): Contiguous store three-word structures from three vectors (immediate index).
ST3W (scalar plus scalar): Contiguous store three-word structures from three vectors (scalar index).
ST4B (scalar plus immediate): Contiguous store four-byte structures from four vectors (immediate index).
ST4B (scalar plus scalar): Contiguous store four-byte structures from four vectors (scalar index).
ST4D (scalar plus immediate): Contiguous store four-doubleword structures from four vectors (immediate index).
ST4D (scalar plus scalar): Contiguous store four-doubleword structures from four vectors (scalar index).
ST4H (scalar plus immediate): Contiguous store four-halfword structures from four vectors (immediate index).
ST4H (scalar plus scalar): Contiguous store four-halfword structures from four vectors (scalar index).
ST4W (scalar plus immediate): Contiguous store four-word structures from four vectors (immediate index).
ST4W (scalar plus scalar): Contiguous store four-word structures from four vectors (scalar index).
STNT1B (scalar plus immediate): Contiguous store non-temporal bytes from vector (immediate index).
STNT1B (scalar plus scalar): Contiguous store non-temporal bytes from vector (scalar index).
STNT1D (scalar plus immediate): Contiguous store non-temporal doublewords from vector (immediate index).
STNT1D (scalar plus scalar): Contiguous store non-temporal doublewords from vector (scalar index).
STNT1H (scalar plus immediate): Contiguous store non-temporal halfwords from vector (immediate index).
STNT1H (scalar plus scalar): Contiguous store non-temporal halfwords from vector (scalar index).
STNT1W (scalar plus immediate): Contiguous store non-temporal words from vector (immediate index).
STNT1W (scalar plus scalar): Contiguous store non-temporal words from vector (scalar index).
Page 1693
A64 -- SVE Instructions (alphabetic order)
TRN1, TRN2 (predicates): Interleave even or odd elements from two predicates.
TRN1, TRN2 (vectors): Interleave even or odd elements from two vectors.
UQDECB: Unsigned saturating decrement scalar by multiple of 8-bit predicate constraint element count.
UQDECD (scalar): Unsigned saturating decrement scalar by multiple of 64-bit predicate constraint element count.
UQDECD (vector): Unsigned saturating decrement vector by multiple of 64-bit predicate constraint element count.
UQDECH (scalar): Unsigned saturating decrement scalar by multiple of 16-bit predicate constraint element count.
UQDECH (vector): Unsigned saturating decrement vector by multiple of 16-bit predicate constraint element count.
UQDECP (scalar): Unsigned saturating decrement scalar by count of true predicate elements.
UQDECP (vector): Unsigned saturating decrement vector by count of true predicate elements.
UQDECW (scalar): Unsigned saturating decrement scalar by multiple of 32-bit predicate constraint element count.
UQDECW (vector): Unsigned saturating decrement vector by multiple of 32-bit predicate constraint element count.
Page 1694
A64 -- SVE Instructions (alphabetic order)
UQINCB: Unsigned saturating increment scalar by multiple of 8-bit predicate constraint element count.
UQINCD (scalar): Unsigned saturating increment scalar by multiple of 64-bit predicate constraint element count.
UQINCD (vector): Unsigned saturating increment vector by multiple of 64-bit predicate constraint element count.
UQINCH (scalar): Unsigned saturating increment scalar by multiple of 16-bit predicate constraint element count.
UQINCH (vector): Unsigned saturating increment vector by multiple of 16-bit predicate constraint element count.
UQINCP (scalar): Unsigned saturating increment scalar by count of true predicate elements.
UQINCP (vector): Unsigned saturating increment vector by count of true predicate elements.
UQINCW (scalar): Unsigned saturating increment scalar by multiple of 32-bit predicate constraint element count.
UQINCW (vector): Unsigned saturating increment vector by multiple of 32-bit predicate constraint element count.
UZP1, UZP2 (predicates): Concatenate even or odd elements from two predicates.
UZP1, UZP2 (vectors): Concatenate even or odd elements from two vectors.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Page 1695
ABS
Compute the absolute value of the signed integer in each active element of the source vector, and place the results in
the corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 0 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = SInt(Elem[operand, e, esize]);
element = Abs(element);
Elem[result, e, esize] = element<esize-1:0>;
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add an unsigned immediate to each element of the source vector, and destructively place the results in the
corresponding elements of the source vector. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 0 0 0 1 1 sh imm8 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
Elem[result, e, esize] = element1 + imm;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add active elements of the second source vector to corresponding elements of the first source vector and destructively
place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector
register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 0 0 0 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element1 + element2;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add all elements of the second source vector to corresponding elements of the first source vector and place the results
in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 0 0 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = element1 + element2;
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add the current predicate register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source
general-purpose register or current stack pointer and place the result in the 64-bit destination general-purpose
register or current stack pointer.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 Rn 0 1 0 1 0 imm6 Rd
Assembler Symbols
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<imm> Is the signed immediate operand, in the range -32 to 31, encoded in the "imm6" field.
Operation
CheckSVEEnabled();
bits(64) operand1 = if n == 31 then SP[] else X[n];
bits(64) result = operand1 + (imm * (PL DIV 8));
if d == 31 then
SP[] = result;
else
X[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add the current vector register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source
general-purpose register or current stack pointer, and place the result in the 64-bit destination general-purpose
register or current stack pointer.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 Rn 0 1 0 1 0 imm6 Rd
Assembler Symbols
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<imm> Is the signed immediate operand, in the range -32 to 31, encoded in the "imm6" field.
Operation
CheckSVEEnabled();
bits(64) operand1 = if n == 31 then SP[] else X[n];
bits(64) result = operand1 + (imm * (VL DIV 8));
if d == 31 then
SP[] = result;
else
X[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Optionally sign or zero-extend the least significant 32-bits of each element from a vector of offsets or indices in the
second source vector, scale each index by 2, 4 or 8, add to a vector of base addresses from the first source vector, and
place the resulting addresses in the destination vector. This instruction is unpredicated.
It has encodings from 3 classes: Packed offsets , Unpacked 32-bit signed offsets and Unpacked 32-bit unsigned offsets
Packed offsets
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 sz 1 Zm 1 0 1 0 msz Zn Zd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 Zm 1 0 1 0 msz Zn Zd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 Zm 1 0 1 0 msz Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
sz <T>
0 S
1 D
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
msz <mod>
00 [absent]
x1 LSL
10 LSL
msz <amount>
00 [absent]
01 #1
10 #2
11 #3
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) base = Z[n];
bits(VL) offs = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) addr = Elem[base, e, esize];
integer offset = Int(Elem[offs, e, esize]<osize-1:0>, unsigned);
Elem[result, e, esize] = addr + (offset * mbytes);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND an immediate with each 64-bit element of the source vector, and destructively place the results in the
corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or
zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.
This instruction is used by the pseudo-instruction BIC (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 0 0 0 0 imm13 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV 64;
bits(VL) operand = Z[dn];
bits(VL) result;
for e = 0 to elements-1
bits(64) element1 = Elem[operand, e, 64];
Elem[result, e, 64] = element1 AND imm;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND active elements of the second source predicate with corresponding elements of the first source predicate
and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination
predicate register are set to zero. Does not set the condition flags.
This instruction is used by the alias MOV (predicate, predicated, zeroing).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Alias Conditions
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = element1 AND element2;
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND active elements of the second source vector with corresponding elements of the first source vector and
destructively place the results in the corresponding elements of the first source vector. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 0 0 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element1 AND element2;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND all elements of the second source vector with corresponding elements of the first source vector and place
the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 Zm 0 0 1 1 0 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND active elements of the second source predicate with corresponding elements of the first source predicate
and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination
predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result,
and the V flag to zero.
This instruction is used by the alias MOVS (predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Alias Conditions
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = element1 AND element2;
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register.
Inactive elements in the source vector are treated as all ones.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 0 0 0 1 Pg Zn Vd
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) result = Ones(esize);
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
result = result AND Elem[operand, e, esize];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift right by immediate, preserving the sign bit, each active element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the
range 1 to number of bits per element. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 0 0 0 0 0 0 1 0 0 Pg tszl imm3 Zdn
L U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<const> Is the immediate shift amount, in the range 1 to number of bits per element, encoded in "tsz:imm3".
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) mask = P[g];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = ASR(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift right by immediate, preserving the sign bit, each element of the source vector, and place the results in the
corresponding elements of the destination vector. The immediate shift amount is an unsigned value in the range 1 to
number of bits per element. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 1 tszl imm3 1 0 0 1 0 0 Zn Zd
U
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
<const> Is the immediate shift amount, in the range 1 to number of bits per element, encoded in "tsz:imm3".
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
Elem[result, e, esize] = ASR(element1, shift);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
ASR (immediate,
Page 1715
unpredicated)
ASR (vectors)
Shift right, preserving the sign bit, active elements of the first source vector by corresponding elements of the second
source vector and destructively place the results in the corresponding elements of the first source vector. The shift
amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element
size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 0 1 0 0 Pg Zm Zdn
R L U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = ASR(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift right, preserving the sign bit, active elements of the first source vector by corresponding overlapping 64-bit
elements of the second source vector and destructively place the results in the corresponding elements of the first
source vector. The shift amount is a vector of unsigned 64-bit doubleword elements in which all bits are significant,
and not used modulo the destination element size. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 0 1 0 0 Pg Zm Zdn
R L U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 RESERVED
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(64) element2 = Elem[operand2, (e * esize) DIV 64, 64];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = ASR(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift right, preserving the sign bit, all elements of the first source vector by corresponding overlapping 64-bit
elements of the second source vector and place the first in the corresponding elements of the destination vector. The
shift amount is a vector of unsigned 64-bit doubleword elements in which all bits are significant, and not used modulo
the destination element size. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 0 0 0 0 Zn Zd
U
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 RESERVED
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(64) element2 = Elem[operand2, (e * esize) DIV 64, 64];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = ASR(element1, shift);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift right by immediate, preserving the sign bit, each active element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The result rounds toward zero as in a signed division. The
immediate shift amount is an unsigned value in the range 1 to number of bits per element. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 0 0 0 1 0 0 1 0 0 Pg tszl imm3 Zdn
L U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<const> Is the immediate shift amount, in the range 1 to number of bits per element, encoded in "tsz:imm3".
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element1 = SInt(Elem[operand1, e, esize]);
if element1 < 0 then
element1 = element1 + ((1 << shift) - 1);
Elem[result, e, esize] = (element1 >> shift)<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reversed shift right, preserving the sign bit, active elements of the second source vector by corresponding elements of
the first source vector and destructively place the results in the corresponding elements of the first source vector. The
shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the
element size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 0 1 0 0 Pg Zm Zdn
R L U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
integer shift = Min(UInt(element1), esize);
Elem[result, e, esize] = ASR(element2, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Convert to BFloat16 from single-precision in each active floating-point element of the source vector, and place the
results in the corresponding elements of the destination vector. Inactive elements in the destination vector register
remain unmodified.
Since the result type is smaller than the input type, the results are zero-extended to fill each destination element.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.
SVE
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 0 1 0 1 0 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV 32;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, 32) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, 32] == '1' then
bits(32) element = Elem[operand, e, 32];
Elem[result, 2*e, 16] = FPConvertBF(element, FPCR[]);
Elem[result, 2*e+1, 16] = Zeros();
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Convert to BFloat16 from single-precision in each active floating-point element of the source vector, and place the
results in the odd-numbered 16-bit elements of the destination vector, leaving the even-numbered elements
unchanged. Inactive elements in the destination vector register remain unmodified.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.
SVE
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV 32;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, 32) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, 32] == '1' then
bits(32) element = Elem[operand, e, 32];
Elem[result, 2*e+1, 16] = FPConvertBF(element, FPCR[]);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SVE
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 1 1 i2 Zm 0 1 0 0 0 0 Zn Zda
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register Z0-Z7, encoded in the "Zm" field.
<imm> Is the immediate index, in the range 0 to 3, encoded in the "i2" field.
CheckSVEEnabled();
integer elements = VL DIV 32;
integer eltspersegment = 128 DIV 32;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(16) elt1_a = Elem[operand1, 2 * e + 0, 16];
bits(16) elt1_b = Elem[operand1, 2 * e + 1, 16];
bits(16) elt2_a = Elem[operand2, 2 * s + 0, 16];
bits(16) elt2_b = Elem[operand2, 2 * s + 1, 16];
bits(32) sum = Elem[operand3, e, 32];
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SVE
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 1 1 Zm 1 0 0 0 0 0 Zn Zda
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV 32;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
bits(16) elt1_a = Elem[operand1, 2 * e + 0, 16];
bits(16) elt1_b = Elem[operand1, 2 * e + 1, 16];
bits(16) elt2_a = Elem[operand2, 2 * e + 0, 16];
bits(16) elt2_b = Elem[operand2, 2 * e + 1, 16];
bits(32) sum = Elem[operand3, e, 32];
Z[da] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
This BFloat16 floating-point multiply-add long instruction widens the even-numbered BFloat16 elements in the first
source vector and the indexed element from the corresponding 128-bit segment in the second source vector to single-
precision format and then destructively multiplies and adds these values without intermediate rounding to the single-
precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the first source
vector. This instruction is unpredicated.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.
SVE
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i3h Zm 0 1 0 0 i3l 0 Zn Zda
o2 op T
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register Z0-Z7, encoded in the "Zm" field.
<imm> Is the immediate index, in the range 0 to 7, encoded in the "i3h:i3l" fields.
Operation
CheckSVEEnabled();
integer elements = VL DIV 32;
integer eltspersegment = 128 DIV 32;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = 2 * segmentbase + index;
bits(32) element1 = Elem[operand1, 2 * e + 0, 16] : Zeros(16);
bits(32) element2 = Elem[operand2, s, 16] : Zeros(16);
bits(32) element3 = Elem[operand3, e, 32];
Elem[result, e, 32] = BFMulAdd(element3, element1, element2, FPCR[]);
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
This BFloat16 floating-point multiply-add long instruction widens the even-numbered BFloat16 elements in the first
source vector and the corresponding elements in the second source vector to single-precision format and then
destructively multiplies and adds these values without intermediate rounding to the single-precision elements of the
destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is
unpredicated.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.
SVE
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 Zm 1 0 0 0 0 0 Zn Zda
o2 op T
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV 32;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
bits(32) element1 = Elem[operand1, 2 * e + 0, 16] : Zeros(16);
bits(32) element2 = Elem[operand2, 2 * e + 0, 16] : Zeros(16);
bits(32) element3 = Elem[operand3, e, 32];
Elem[result, e, 32] = BFMulAdd(element3, element1, element2, FPCR[]);
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
This BFloat16 floating-point multiply-add long instruction widens the odd-numbered BFloat16 elements in the first
source vector and the indexed element from the corresponding 128-bit segment in the second source vector to single-
precision format and then destructively multiplies and adds these values without intermediate rounding to the single-
precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the first source
vector. This instruction is unpredicated.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.
SVE
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i3h Zm 0 1 0 0 i3l 1 Zn Zda
o2 op T
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register Z0-Z7, encoded in the "Zm" field.
<imm> Is the immediate index, in the range 0 to 7, encoded in the "i3h:i3l" fields.
Operation
CheckSVEEnabled();
integer elements = VL DIV 32;
integer eltspersegment = 128 DIV 32;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = 2 * segmentbase + index;
bits(32) element1 = Elem[operand1, 2 * e + 1, 16] : Zeros(16);
bits(32) element2 = Elem[operand2, s, 16] : Zeros(16);
bits(32) element3 = Elem[operand3, e, 32];
Elem[result, e, 32] = BFMulAdd(element3, element1, element2, FPCR[]);
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
This BFloat16 floating-point multiply-add long instruction widens the odd-numbered BFloat16 elements in the first
source vector and the corresponding elements in the second source vector to single-precision format and then
destructively multiplies and adds these values without intermediate rounding to the single-precision elements of the
destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is
unpredicated.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.
SVE
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 Zm 1 0 0 0 0 1 Zn Zda
o2 op T
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV 32;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
bits(32) element1 = Elem[operand1, 2 * e + 1, 16] : Zeros(16);
bits(32) element2 = Elem[operand2, 2 * e + 1, 16] : Zeros(16);
bits(32) element3 = Elem[operand3, e, 32];
Elem[result, e, 32] = BFMulAdd(element3, element1, element2, FPCR[]);
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
SVE
(FEAT_BF16)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 1 1 Zm 1 1 1 0 0 1 Zn Zda
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer segments = VL DIV 128;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
bits(128) op1, op2;
bits(128) res, addend;
for s = 0 to segments-1
op1 = Elem[operand1, s, 128];
op2 = Elem[operand2, s, 128];
addend = Elem[operand3, s, 128];
res = BFMatMulAdd(addend, op1, op2);
Elem[result, s, 128] = res;
Z[da] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise clear bits using immediate with each 64-bit element of the source vector, and destructively place the results in
the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or
zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.
• The encodings in this description are named to match the encodings of AND (immediate).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of AND (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 0 0 0 0 imm13 Zdn
is equivalent to
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.
Operation
The description of AND (immediate) gives the operational pseudocode for this instruction.
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND inverted active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 0 Pn 1 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = element1 AND (NOT element2);
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND inverted active elements of the second source vector with corresponding elements of the first source
vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in
the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 1 0 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element1 AND (NOT element2);
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND inverted all elements of the second source vector with corresponding elements of the first source vector
and place the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 Zm 0 0 1 1 0 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise AND inverted active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the
predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 0 Pn 1 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = element1 AND (NOT element2);
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Sets destination predicate elements up to and including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register remain unmodified or are set to
zero, depending on whether merging or zeroing predication is selected. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 Pg 0 Pn M Pd
B S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
M <ZM>
0 Z
1 M
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand = P[n];
bits(PL) operand2 = P[d];
boolean break = FALSE;
bits(PL) result;
for e = 0 to elements-1
boolean element = ElemP[operand, e, esize] == '1';
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = if !break then '1' else '0';
break = break || element;
elsif merging then
ElemP[result, e, esize] = ElemP[operand2, e, esize];
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Sets destination predicate elements up to and including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0 1 Pg 0 Pn 0 Pd
B S M
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand = P[n];
bits(PL) operand2 = P[d];
boolean break = FALSE;
bits(PL) result;
for e = 0 to elements-1
boolean element = ElemP[operand, e, esize] == '1';
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = if !break then '1' else '0';
break = break || element;
elsif merging then
ElemP[result, e, esize] = ElemP[operand2, e, esize];
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Sets destination predicate elements up to but not including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register remain unmodified or are set to
zero, depending on whether merging or zeroing predication is selected. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 0 1 Pg 0 Pn M Pd
B S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
M <ZM>
0 Z
1 M
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand = P[n];
bits(PL) operand2 = P[d];
boolean break = FALSE;
bits(PL) result;
for e = 0 to elements-1
boolean element = ElemP[operand, e, esize] == '1';
if ElemP[mask, e, esize] == '1' then
break = break || element;
ElemP[result, e, esize] = if !break then '1' else '0';
elsif merging then
ElemP[result, e, esize] = ElemP[operand2, e, esize];
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Sets destination predicate elements up to but not including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 1 0 0 0 0 0 1 Pg 0 Pn 0 Pd
B S M
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand = P[n];
bits(PL) operand2 = P[d];
boolean break = FALSE;
bits(PL) result;
for e = 0 to elements-1
boolean element = ElemP[operand, e, esize] == '1';
if ElemP[mask, e, esize] == '1' then
break = break || element;
ElemP[result, e, esize] = if !break then '1' else '0';
elsif merging then
ElemP[result, e, esize] = ElemP[operand2, e, esize];
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise
leaves the destination and second source predicate unchanged. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 Pg 0 Pn 0 Pdm
S
Assembler Symbols
<Pdm> Is the name of the second source and destination scalable predicate register, encoded in the "Pdm"
field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
Operation
CheckSVEEnabled();
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[dm];
bits(PL) result;
if setflags then
PSTATE.<N,Z,C,V> = PredTest(Ones(PL), result, 8);
P[dm] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise
leaves the destination and second source predicate unchanged. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags
based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 0 1 Pg 0 Pn 0 Pdm
S
Assembler Symbols
<Pdm> Is the name of the second source and destination scalable predicate register, encoded in the "Pdm"
field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
Operation
CheckSVEEnabled();
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[dm];
bits(PL) result;
if setflags then
PSTATE.<N,Z,C,V> = PredTest(Ones(PL), result, 8);
P[dm] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise
sets destination predicate elements up to and including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Does not set the
condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 1 1 Pg 0 Pn 0 Pd
S B
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
boolean last = (LastActive(mask, operand1, 8) == '1');
for e = 0 to elements-1
if ElemP[mask, e, 8] == '1' then
ElemP[result, e, 8] = if last then '1' else '0';
last = last && (ElemP[operand2, e, 8] == '0');
else
ElemP[result, e, 8] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Break after first true condition, propagating from previous partition and setting the condition flags
If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise
sets destination predicate elements up to and including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 1 1 Pg 0 Pn 0 Pd
S B
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
boolean last = (LastActive(mask, operand1, 8) == '1');
for e = 0 to elements-1
if ElemP[mask, e, 8] == '1' then
ElemP[result, e, 8] = if last then '1' else '0';
last = last && (ElemP[operand2, e, 8] == '0');
else
ElemP[result, e, 8] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise
sets destination predicate elements up to but not including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Does not set the
condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 1 1 Pg 0 Pn 1 Pd
S B
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
boolean last = (LastActive(mask, operand1, 8) == '1');
for e = 0 to elements-1
if ElemP[mask, e, 8] == '1' then
last = last && (ElemP[operand2, e, 8] == '0');
ElemP[result, e, 8] = if last then '1' else '0';
else
ElemP[result, e, 8] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Break before first true condition, propagating from previous partition and setting the condition flags
If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise
sets destination predicate elements up to but not including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 1 1 Pg 0 Pn 1 Pd
S B
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
boolean last = (LastActive(mask, operand1, 8) == '1');
for e = 0 to elements-1
if ElemP[mask, e, 8] == '1' then
last = last && (ElemP[operand2, e, 8] == '0');
ElemP[result, e, 8] = if last then '1' else '0';
else
ElemP[result, e, 8] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
From the source vector register extract the element after the last active element, or if the last active element is the
final element extract element zero, and then zero-extend that element to destructively place in the destination and
first source general-purpose register. If there are no active elements then destructively zero-extend the least
significant element-size bits of the destination and first source general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 0 1 0 1 Pg Zm Rdn
B
Assembler Symbols
size <R>
01 W
x0 W
11 X
<dn> Is the number [0-30] of the source and destination general-purpose register or the name ZR (31),
encoded in the "Rdn" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the source scalable vector register, encoded in the "Zm" field.
size <T>
00 B
01 H
10 S
11 D
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(esize) operand1 = X[dn];
bits(VL) operand2 = Z[m];
bits(csize) result;
integer last = LastActiveElement(mask, esize);
X[dn] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
From the source vector register extract the element after the last active element, or if the last active element is the
final element extract element zero, and then zero-extend that element to destructively place in the destination and
first source SIMD & floating-point scalar register. If there are no active elements then destructively zero-extend the
least significant element-size bits of the destination and first source SIMD & floating-point scalar register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 1 0 1 0 0 Pg Zm Vdn
B
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<dn> Is the number [0-31] of the source and destination SIMD&FP register, encoded in the "Vdn" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the source scalable vector register, encoded in the "Zm" field.
size <T>
00 B
01 H
10 S
11 D
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(esize) operand1 = V[dn];
bits(VL) operand2 = Z[m];
bits(esize) result;
integer last = LastActiveElement(mask, esize);
V[dn] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
From the second source vector register extract the element after the last active element, or if the last active element
is the final element extract element zero, and then replicate that element to destructively fill the destination and first
source vector.
If there are no active elements then leave the destination and source vector unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 0 1 0 0 Pg Zm Zdn
B
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = Z[m];
bits(VL) result;
integer last = LastActiveElement(mask, esize);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
From the source vector register extract the last active element, and then zero-extend that element to destructively
place in the destination and first source general-purpose register. If there are no active elements then destructively
zero-extend the least significant element-size bits of the destination and first source general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 1 1 0 1 Pg Zm Rdn
B
Assembler Symbols
size <R>
01 W
x0 W
11 X
<dn> Is the number [0-30] of the source and destination general-purpose register or the name ZR (31),
encoded in the "Rdn" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the source scalable vector register, encoded in the "Zm" field.
size <T>
00 B
01 H
10 S
11 D
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(esize) operand1 = X[dn];
bits(VL) operand2 = Z[m];
bits(csize) result;
integer last = LastActiveElement(mask, esize);
X[dn] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
From the source vector register extract the last active element, and then zero-extend that element to destructively
place in the destination and first source SIMD & floating-point scalar register. If there are no active elements then
destructively zero-extend the least significant element-size bits of the destination and first source SIMD & floating-
point scalar register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 1 1 1 0 0 Pg Zm Vdn
B
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<dn> Is the number [0-31] of the source and destination SIMD&FP register, encoded in the "Vdn" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the source scalable vector register, encoded in the "Zm" field.
size <T>
00 B
01 H
10 S
11 D
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(esize) operand1 = V[dn];
bits(VL) operand2 = Z[m];
bits(esize) result;
integer last = LastActiveElement(mask, esize);
V[dn] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
From the second source vector register extract the last active element, and then replicate that element to
destructively fill the destination and first source vector.
If there are no active elements then leave the destination and source vector unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 1 1 0 0 Pg Zm Zdn
B
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = Z[m];
bits(VL) result;
integer last = LastActiveElement(mask, esize);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Count leading sign bits in each active element of the source vector, and place the results in the corresponding
elements of the destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 0 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = CountLeadingSignBits(element)<esize-1:0>;
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Count leading zero bits in each active element of the source vector, and place the results in the corresponding
elements of the destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 1 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = CountLeadingZeroBits(element)<esize-1:0>;
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare active integer elements in the source vector with an immediate, and place the boolean results of the
specified comparison in the corresponding elements of the destination predicate. Inactive elements in the destination
predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result,
and the V flag to zero.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, HI, HS, LE, LO, LS, LT or NE.
It has encodings from 10 classes: Equal , Greater than , Greater than or equal , Higher , Higher or same , Less than ,
Less than or equal , Lower , Lower or same and Not equal
Equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 1 0 0 Pg Zn 0 Pd
ne
Greater than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 0 0 0 Pg Zn 1 Pd
lt ne
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 0 0 0 Pg Zn 0 Pd
lt ne
Higher
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 0 Pg Zn 1 Pd
lt ne
Higher or same
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 0 Pg Zn 0 Pd
lt ne
Less than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 0 0 1 Pg Zn 0 Pd
lt ne
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 0 0 1 Pg Zn 1 Pd
lt ne
Lower
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 1 Pg Zn 0 Pd
lt ne
Lower or same
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 1 Pg Zn 1 Pd
lt ne
Not equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 1 0 0 Pg Zn 1 Pd
ne
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
<imm> For the equal, greater than, greater than or equal, less than, less than or equal and not equal variant: is
the signed immediate operand, in the range -16 to 15, encoded in the "imm5" field.
For the higher, higher or same, lower and lower or same variant: is the unsigned immediate operand, in
the range 0 to 127, encoded in the "imm7" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(PL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
boolean cond;
case op of
when Cmp_EQ cond = element1 == imm;
when Cmp_NE cond = element1 != imm;
when Cmp_GE cond = element1 >= imm;
when Cmp_LT cond = element1 < imm;
when Cmp_GT cond = element1 > imm;
when Cmp_LE cond = element1 <= imm;
ElemP[result, e, esize] = if cond then '1' else '0';
else
ElemP[result, e, esize] = '0';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare vectors
Compare active integer elements in the first source vector with corresponding elements in the second source vector,
and place the boolean results of the specified comparison in the corresponding elements of the destination predicate.
Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition
flags based on the predicate result, and the V flag to zero.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, HI, HS or NE.
This instruction is used by the pseudo-instructions CMPLE (vectors), CMPLO (vectors), CMPLS (vectors), and CMPLT
(vectors).
It has encodings from 6 classes: Equal , Greater than , Greater than or equal , Higher , Higher or same and Not equal
Equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 1 Pg Zn 0 Pd
ne
Greater than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 0 Pg Zn 1 Pd
ne
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 0 Pg Zn 0 Pd
ne
Higher
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 0 Pg Zn 1 Pd
ne
Higher or same
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 0 Pg Zn 0 Pd
ne
Not equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 1 Pg Zn 1 Pd
ne
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(PL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
boolean cond;
integer element2 = Int(Elem[operand2, e, esize], unsigned);
case op of
when Cmp_EQ cond = element1 == element2;
when Cmp_NE cond = element1 != element2;
when Cmp_GE cond = element1 >= element2;
when Cmp_LT cond = element1 < element2;
when Cmp_GT cond = element1 > element2;
when Cmp_LE cond = element1 <= element2;
ElemP[result, e, esize] = if cond then '1' else '0';
else
ElemP[result, e, esize] = '0';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare active integer elements in the first source vector with overlapping 64-bit doubleword elements in the second
source vector, and place the boolean results of the specified comparison in the corresponding elements of the
destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N), NONE
(Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, HI, HS, LE, LO, LS, LT or NE.
It has encodings from 10 classes: Equal , Greater than , Greater than or equal , Higher , Higher or same , Less than ,
Less than or equal , Lower , Lower or same and Not equal
Equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 1 Pg Zn 0 Pd
ne
Greater than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 1 0 Pg Zn 1 Pd
U lt ne
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 1 0 Pg Zn 0 Pd
U lt ne
Higher
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 1 0 Pg Zn 1 Pd
U lt ne
Higher or same
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 1 0 Pg Zn 0 Pd
U lt ne
Less than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 1 1 Pg Zn 0 Pd
U lt ne
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 1 1 Pg Zn 1 Pd
U lt ne
Lower
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 1 1 Pg Zn 0 Pd
U lt ne
Lower or same
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 1 1 Pg Zn 1 Pd
U lt ne
Not equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 1 Pg Zn 1 Pd
ne
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 B
01 H
10 S
11 RESERVED
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(PL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
boolean cond;
integer element2 = Int(Elem[operand2, (e * esize) DIV 64, 64], unsigned);
case op of
when Cmp_EQ cond = element1 == element2;
when Cmp_NE cond = element1 != element2;
when Cmp_GE cond = element1 >= element2;
when Cmp_LT cond = element1 < element2;
when Cmp_GT cond = element1 > element2;
when Cmp_LE cond = element1 <= element2;
ElemP[result, e, esize] = if cond then '1' else '0';
else
ElemP[result, e, esize] = '0';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare signed less than or equal to vector, setting the condition flags
Compare active signed integer elements in the first source vector being less than or equal to corresponding signed
elements in the second source vector, and place the boolean results of the comparison in the corresponding elements
of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
• The encodings in this description are named to match the encodings of CMP<cc> (vectors).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 0 Pg Zn 0 Pd
ne
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
Operation
The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare active unsigned integer elements in the first source vector being lower than corresponding unsigned
elements in the second source vector, and place the boolean results of the comparison in the corresponding elements
of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
• The encodings in this description are named to match the encodings of CMP<cc> (vectors).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 0 Pg Zn 1 Pd
ne
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
Operation
The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare active unsigned integer elements in the first source vector being lower than or same as corresponding
unsigned elements in the second source vector, and place the boolean results of the comparison in the corresponding
elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the
FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
• The encodings in this description are named to match the encodings of CMP<cc> (vectors).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 0 Pg Zn 0 Pd
ne
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
Operation
The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare active signed integer elements in the first source vector being less than corresponding signed elements in
the second source vector, and place the boolean results of the comparison in the corresponding elements of the
destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N), NONE
(Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
• The encodings in this description are named to match the encodings of CMP<cc> (vectors).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 0 Pg Zn 1 Pd
ne
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
Operation
The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Logically invert the boolean value in each active element of the source vector, and place the results in the
corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
Boolean TRUE is any non-zero value in a source, and one in a result element. Boolean FALSE is always zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 1 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = ZeroExtend(IsZeroBit(element), esize);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Count non-zero bits in each active element of the source vector, and place the results in the corresponding elements of
the destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 0 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = BitCount(element)<esize-1:0>;
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate
in the range 1 to 16 inclusive, and then places the result in the scalar destination.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 4 classes: Byte , Doubleword , Halfword and Word
Byte
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 0 0 0 pattern Rd
size<1>size<0>
Doubleword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 0 0 0 pattern Rd
size<1>size<0>
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 0 0 0 pattern Rd
size<1>size<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 0 0 0 pattern Rd
size<1>size<0>
Assembler Symbols
<Xd> Is the 64-bit name of the destination general-purpose register, encoded in the "Rd" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Counts the number of active and true elements in the source predicate and places the scalar result in the destination
general-purpose register. Inactive predicate elements are not counted.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 0 0 0 1 0 Pg 0 Pn Rd
Assembler Symbols
<Xd> Is the 64-bit name of the destination general-purpose register, encoded in the "Rd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand = P[n];
bits(64) sum = Zeros();
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' && ElemP[operand, e, esize] == '1' then
sum = sum + 1;
X[d] = sum;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shuffle active elements of vector to the right and fill with zero
Read the active elements from the source vector and pack them into the lowest-numbered elements of the destination
vector. Then set any remaining elements of the destination vector to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 1 1 0 0 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size<0> <T>
0 S
1 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Zeros();
integer x = 0;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand1, e, esize];
Elem[result, x, esize] = element;
x = x + 1;
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Copy a signed integer immediate to each active element in the destination vector. Inactive elements in the destination
vector register remain unmodified.
The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0).
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<simm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
This instruction is used by the alias MOV (immediate, predicated, merging).
This instruction is used by the pseudo-instruction FMOV (zero, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 1 sh imm8 Zd
M
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<imm> Is a signed immediate in the range -128 to 127, encoded in the "imm8" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #8
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) dest = Z[d];
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = imm<esize-1:0>;
elsif merging then
Elem[result, e, esize] = Elem[dest, e, esize];
else
Elem[result, e, esize] = Zeros();
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Copy a signed integer immediate to each active element in the destination vector. Inactive elements in the destination
vector register are set to zero.
The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0).
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<simm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
This instruction is used by the alias MOV (immediate, predicated, zeroing).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 0 sh imm8 Zd
M
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<imm> Is a signed immediate in the range -128 to 127, encoded in the "imm8" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #8
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) dest = Z[d];
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = imm<esize-1:0>;
elsif merging then
Elem[result, e, esize] = Elem[dest, e, esize];
else
Elem[result, e, esize] = Zeros();
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Copy the general-purpose scalar source register to each active element in the destination vector. Inactive elements in
the destination vector register remain unmodified.
This instruction is used by the alias MOV (scalar, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 0 1 0 1 Pg Rn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
size <R>
01 W
x0 W
11 X
<n|SP> Is the number [0-30] of the general-purpose source register or the name SP (31), encoded in the "Rn"
field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) result = Z[d];
if AnyActiveElement(mask, esize) then
bits(64) operand1;
if n == 31 then
operand1 = SP[];
else
operand1 = X[n];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = operand1<esize-1:0>;
Z[d] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Copy the SIMD & floating-point scalar source register to each active element in the destination vector. Inactive
elements in the destination vector register remain unmodified.
This instruction is used by the alias MOV (SIMD&FP scalar, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 1 0 0 Pg Vn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
size <V>
00 B
01 H
10 S
11 D
<n> Is the number [0-31] of the source SIMD&FP register, encoded in the "Vn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(esize) operand1 = if AnyActiveElement(mask, esize) then V[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = operand1;
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Detect termination conditions in serialized vector loops. Tests whether the comparison between the scalar source
operands holds true and if not tests the state of the !LAST condition flag (C) which indicates whether the previous flag-
setting predicate instruction selected the last element of the vector partition.
The Z and C condition flags are preserved by this instruction. The N and V condition flags are set as a pair to generate
one of the following conditions for a subsequent conditional instruction:
* GE (N=0 & V=0): continue loop (compare failed and last element not selected);
* LT (N=0 & V=1): terminate loop (last element selected);
* LT (N=1 & V=0): terminate loop (compare succeeded);
The scalar source operands are 32-bit or 64-bit general-purpose registers of the same size.
It has encodings from 2 classes: Equal and Not equal
Equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 sz 1 Rm 0 0 1 0 0 0 Rn 0 0 0 0 0
ne
Not equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 sz 1 Rm 0 0 1 0 0 0 Rn 1 0 0 0 0
ne
Assembler Symbols
sz <R>
0 W
1 X
<n> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rn"
field.
<m> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rm"
field.
CheckSVEEnabled();
bits(esize) operand1 = X[n];
bits(esize) operand2 = X[m];
integer element1 = UInt(operand1);
integer element2 = UInt(operand2);
boolean term;
case op of
when Cmp_EQ term = element1 == element2;
when Cmp_NE term = element1 != element2;
if term then
PSTATE.N = '1';
PSTATE.V = '0';
else
PSTATE.N = '0';
PSTATE.V = (NOT PSTATE.C);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate
in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 4 classes: Byte , Doubleword , Halfword and Word
Byte
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 0 0 1 pattern Rdn
size<1>size<0> D
Doubleword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 0 0 1 pattern Rdn
size<1>size<0> D
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 0 0 1 pattern Rdn
size<1>size<0> D
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 0 0 1 pattern Rdn
size<1>size<0> D
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(64) operand1 = X[dn];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate
in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 3 classes: Doubleword , Halfword and Word
Doubleword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D
Word
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
Elem[result, e, esize] = Elem[operand1, e, esize] - (count * imm);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Counts the number of true elements in the source predicate and then uses the result to decrement the scalar
destination.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 0 1 1 0 0 0 1 0 0 Pm Rdn
D
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) operand1 = X[dn];
bits(PL) operand2 = P[m];
integer count = 0;
for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Counts the number of true elements in the source predicate and then uses the result to decrement all destination
vector elements.
The predicate size specifier may be omitted in assembler source code, but this is deprecated and will be prohibited in
a future release of the architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 0 1 1 0 0 0 0 0 0 Pm Zdn
D
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) operand2 = P[m];
bits(VL) result;
integer count = 0;
for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;
for e = 0 to elements-1
Elem[result, e, esize] = Elem[operand1, e, esize] - count;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Unconditionally broadcast the signed integer immediate into each element of the destination vector. This instruction is
unpredicated.
The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0).
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<simm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
This instruction is used by the alias MOV (immediate, unpredicated).
This instruction is used by the pseudo-instruction FMOV (zero, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 0 1 1 sh imm8 Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is a signed immediate in the range -128 to 127, encoded in the "imm8" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
bits(VL) result = Replicate(imm<esize-1:0>);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unconditionally broadcast the indexed source vector element into each element of the destination vector. This
instruction is unpredicated.
The immediate element index is in the range of 0 to 63 (bytes), 31 (halfwords), 15 (words), 7 (doublewords) or 3
(quadwords). Selecting an element beyond the accessible vector length causes the destination vector to be set to zero.
This instruction is used by the alias MOV (SIMD&FP scalar, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 imm2 1 tsz 0 0 1 0 0 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
tsz <T>
00000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
10000 Q
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
<imm> Is the immediate index, in the range 0 to one less than the number of elements in 512 bits, encoded in
"imm2:tsz".
Alias Conditions
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) result;
bits(esize) element;
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unconditionally broadcast the general-purpose scalar source register into each element of the destination vector. This
instruction is unpredicated.
This instruction is used by the alias MOV (scalar, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
size <R>
01 W
x0 W
11 X
<n|SP> Is the number [0-30] of the general-purpose source register or the name SP (31), encoded in the "Rn"
field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) operand;
if n == 31 then
operand = SP[];
else
operand = X[n];
bits(VL) result;
for e = 0 to elements-1
Elem[result, e, esize] = operand<esize-1:0>;
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unconditionally broadcast the logical bitmask immediate into each element of the destination vector. This instruction
is unpredicated. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16,
32 or 64 bits.
This instruction is used by the alias MOV.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 1 0 0 0 0 imm13 Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.
Alias Conditions
Operation
CheckSVEEnabled();
bits(VL) result = Replicate(imm);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise exclusive OR an inverted immediate with each 64-bit element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of
ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.
• The encodings in this description are named to match the encodings of EOR (immediate).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of EOR (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 1 0 0 0 0 imm13 Zdn
is equivalent to
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.
Operation
The description of EOR (immediate) gives the operational pseudocode for this instruction.
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise exclusive OR an immediate with each 64-bit element of the source vector, and destructively place the results in
the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or
zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.
This instruction is used by the pseudo-instruction EON.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 1 0 0 0 0 imm13 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV 64;
bits(VL) operand = Z[dn];
bits(VL) result;
for e = 0 to elements-1
bits(64) element1 = Elem[operand, e, 64];
Elem[result, e, 64] = element1 EOR imm;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise exclusive OR active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Does not set the condition flags.
This instruction is used by the alias NOT (predicate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Alias Conditions
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = element1 EOR element2;
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise exclusive OR active elements of the second source vector with corresponding elements of the first source
vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in
the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 1 0 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element1 EOR element2;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise exclusive OR all elements of the second source vector with corresponding elements of the first source vector
and place the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 Zm 0 0 1 1 0 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise exclusive OR active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the
predicate result, and the V flag to zero.
This instruction is used by the alias NOTS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Alias Conditions
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = element1 EOR element2;
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise exclusive OR horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 1 0 0 1 Pg Zn Vd
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) result = Zeros(esize);
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
result = result EOR Elem[operand, e, esize];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Copy the indexed byte up to the last byte of the first source vector to the bottom of the result vector, then fill the
remainder of the result starting from the first byte of the second source vector. The result is placed destructively in the
first source vector. This instruction is unpredicated.
An index that is greater than or equal to the vector length in bytes is treated as zero, leaving the destination and first
source vector unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 1 imm8h 0 0 0 imm8l Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<imm> Is the unsigned immediate operand, in the range 0 to 255, encoded in the "imm8h:imm8l" fields.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = Z[m];
bits(VL) result;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compute the absolute difference of active floating-point elements of the second source vector and corresponding
floating-point elements of the first source vector and destructively place the result in the corresponding elements of
the first source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 0 0 0 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPAbs(FPSub(element1, element2, FPCR[]));
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Take the absolute value of each active floating-point element of the source vector, and place the results in the
corresponding elements of the destination vector. This clears the sign bit and cannot signal a floating-point exception.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 1 0 0 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPAbs(element);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare active absolute values of floating-point elements in the first source vector with corresponding absolute
values of elements in the second source vector, and place the boolean results of the specified comparison in the
corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to
zero. Does not set the condition flags.
The <cc> symbol specifies one of the standard ARM condition codes: GE, GT, LE, or LT.
This instruction is used by the pseudo-instructions FACLE, and FACLT.
It has encodings from 2 classes: Greater than and Greater than or equal
Greater than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 1 Pg Zn 1 Pd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 0 Pg Zn 1 Pd
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(PL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
boolean res;
case op of
when Cmp_GE res = FPCompareGE(FPAbs(element1), FPAbs(element2), FPCR[]);
when Cmp_GT res = FPCompareGT(FPAbs(element1), FPAbs(element2), FPCR[]);
ElemP[result, e, esize] = if res then '1' else '0';
else
ElemP[result, e, esize] = '0';
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare active absolute values of floating-point elements in the first source vector being less than or equal to
corresponding absolute values of elements in the second source vector, and place the boolean results of the
comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate
register are set to zero. Does not set the condition flags.
• The encodings in this description are named to match the encodings of FAC<cc>.
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of FAC<cc> gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 0 Pg Zn 1 Pd
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
Operation
The description of FAC<cc> gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare active absolute values of floating-point elements in the first source vector being less than corresponding
absolute values of elements in the second source vector, and place the boolean results of the comparison in the
corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to
zero. Does not set the condition flags.
• The encodings in this description are named to match the encodings of FAC<cc>.
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of FAC<cc> gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 1 Pg Zn 1 Pd
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
Operation
The description of FAC<cc> gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add an immediate to each active floating-point element of the source vector, and destructively place the results in the
corresponding elements of the source vector. The immediate may take the value +0.5 or +1.0 only. Inactive elements
in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 0 0 1 0 0 Pg 0 0 0 0 i1 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
i1 <const>
0 #0.5
1 #1.0
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPAdd(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add active floating-point elements of the second source vector to corresponding floating-point elements of the first
source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 0 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPAdd(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add all floating-point elements of the second source vector to corresponding elements of the first source vector and
place the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 0 0 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPAdd(element1, element2, FPCR[]);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point add a SIMD&FP scalar source and all active lanes of the vector source and place the result
destructively in the SIMD&FP scalar source register. Vector elements are processed strictly in order from low to high,
with the scalar source providing the initial value. Inactive elements in the source vector are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 0 0 0 0 1 Pg Zm Vdn
Assembler Symbols
size <V>
00 RESERVED
01 H
10 S
11 D
<dn> Is the number [0-31] of the source and destination SIMD&FP register, encoded in the "Vdn" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the source scalable vector register, encoded in the "Zm" field.
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(esize) operand1 = V[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(esize) result = operand1;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand2, e, esize];
result = FPAdd(result, element, FPCR[]);
V[dn] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point add horizontally over all lanes of a vector using a recursive pairwise reduction, and place the result in
the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as +0.0.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 0 0 0 1 Pg Zn Vd
Assembler Symbols
size <V>
00 RESERVED
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) identity = FPZero('0');
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Add the real and imaginary components of the active floating-point complex numbers from the first source vector to
the complex numbers from the second source vector which have first been rotated by 90 or 270 degrees in the
direction from the positive real axis towards the positive imaginary axis, when considered in polar representation,
equivalent to multiplying the complex numbers in the second source vector by ± J beforehand. Destructively place the
results in the corresponding elements of the first source vector. Inactive elements in the destination vector register
remain unmodified.
Each complex number is represented in a vector register as an even/odd pair of elements with the real part in the
even-numbered element and the imaginary part in the odd-numbered element.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 0 0 0 0 0 rot 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
rot <const>
0 #90
1 #270
CheckSVEEnabled();
integer pairs = VL DIV (2 * esize);
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for p = 0 to pairs-1
acc_r = Elem[operand1, 2 * p + 0, esize];
acc_i = Elem[operand1, 2 * p + 1, esize];
if ElemP[mask, 2 * p + 0, esize] == '1' then
elt2_i = Elem[operand2, 2 * p + 1, esize];
if sub_i then elt2_i = FPNeg(elt2_i);
acc_r = FPAdd(acc_r, elt2_i, FPCR[]);
if ElemP[mask, 2 * p + 1, esize] == '1' then
elt2_r = Elem[operand2, 2 * p + 0, esize];
if sub_r then elt2_r = FPNeg(elt2_r);
acc_i = FPAdd(acc_i, elt2_r, FPCR[]);
Elem[result, 2 * p + 0, esize] = acc_r;
Elem[result, 2 * p + 1, esize] = acc_i;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare active floating-point elements in the first source vector with corresponding elements in the second source
vector, and place the boolean results of the specified comparison in the corresponding elements of the destination
predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, or NE, with the addition of UO for
an unordered comparison.
This instruction is used by the pseudo-instructions FCMLE (vectors), and FCMLT (vectors).
It has encodings from 5 classes: Equal , Greater than , Greater than or equal , Not equal and Unordered
Equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 1 Pg Zn 0 Pd
cmph cmpl
Greater than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 0 Pg Zn 1 Pd
cmph cmpl
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 0 Pg Zn 0 Pd
cmph cmpl
Not equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 1 Pg Zn 1 Pd
cmph cmpl
Unordered
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 0 Pg Zn 0 Pd
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(PL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
boolean res;
case op of
when Cmp_EQ res = FPCompareEQ(element1, element2, FPCR[]);
when Cmp_GE res = FPCompareGE(element1, element2, FPCR[]);
when Cmp_GT res = FPCompareGT(element1, element2, FPCR[]);
when Cmp_UN res = FPCompareUN(element1, element2, FPCR[]);
when Cmp_NE res = FPCompareNE(element1, element2, FPCR[]);
when Cmp_LT res = FPCompareGT(element2, element1, FPCR[]);
when Cmp_LE res = FPCompareGE(element2, element1, FPCR[]);
ElemP[result, e, esize] = if res then '1' else '0';
else
ElemP[result, e, esize] = '0';
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare active floating-point elements in the source vector with zero, and place the boolean results of the specified
comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate
register are set to zero. Does not set the condition flags.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, LE, LT, or NE.
It has encodings from 6 classes: Equal , Greater than , Greater than or equal , Less than , Less than or equal and Not
equal
Equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 1 0 0 0 1 Pg Zn 0 Pd
eq lt ne
Greater than
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 0 0 0 0 1 Pg Zn 1 Pd
eq lt ne
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 0 0 0 0 1 Pg Zn 0 Pd
eq lt ne
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 0 1 0 0 1 Pg Zn 0 Pd
eq lt ne
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 0 1 0 0 1 Pg Zn 1 Pd
eq lt ne
Not equal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 1 1 0 0 1 Pg Zn 0 Pd
eq lt ne
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(PL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
boolean res;
case op of
when Cmp_EQ res = FPCompareEQ(element, 0<esize-1:0>, FPCR[]);
when Cmp_GE res = FPCompareGE(element, 0<esize-1:0>, FPCR[]);
when Cmp_GT res = FPCompareGT(element, 0<esize-1:0>, FPCR[]);
when Cmp_NE res = FPCompareNE(element, 0<esize-1:0>, FPCR[]);
when Cmp_LT res = FPCompareGT(0<esize-1:0>, element, FPCR[]);
when Cmp_LE res = FPCompareGE(0<esize-1:0>, element, FPCR[]);
ElemP[result, e, esize] = if res then '1' else '0';
else
ElemP[result, e, esize] = '0';
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply the duplicated real components for rotations 0 and 180, or imaginary components for rotations 90 and 270, of
the floating-point complex numbers in each 128-bit segment of the first source vector by the specified complex number
in the corresponding the second source vector segment rotated by 0, 90, 180 or 270 degrees in the direction from the
positive real axis towards the positive imaginary axis, when considered in polar representation.
Then destructively add the products to the corresponding components of the complex numbers in the addend and
destination vector, without intermediate rounding.
These transformations permit the creation of a variety of multiply-add and multiply-subtract operations on complex
numbers by combining two of these instructions with the same vector operands but with rotations that are 90 degrees
apart.
Each complex number is represented in a vector register as an even/odd pair of elements with the real part in the
even-numbered element and the imaginary part in the odd-numbered element.
The complex numbers within the second source vector are specified using an immediate index which selects the same
complex number position within each 128-bit vector segment. The index range is from 0 to one less than the number
of complex numbers per 128-bit segment, encoded in 1 to 2 bits depending on the size of the complex number. This
instruction is unpredicated.
It has encodings from 2 classes: Half-precision and Single-precision
Half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 1 rot Zn Zda
size<1>size<0>
Single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 1 rot Zn Zda
size<1>size<0>
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> For the half-precision variant: is the name of the second source scalable vector register Z0-Z7, encoded
in the "Zm" field.
For the single-precision variant: is the name of the second source scalable vector register Z0-Z15,
encoded in the "Zm" field.
<imm> For the half-precision variant: is the index of a Real and Imaginary pair, in the range 0 to 3, encoded in
the "i2" field.
For the single-precision variant: is the index of a Real and Imaginary pair, in the range 0 to 1, encoded
in the "i1" field.
rot <const>
00 #0
01 #90
10 #180
11 #270
Operation
CheckSVEEnabled();
integer pairs = VL DIV (2 * esize);
integer pairspersegment = 128 DIV (2 * esize);
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for p = 0 to pairs-1
segmentbase = p - (p MOD pairspersegment);
s = segmentbase + index;
addend_r = Elem[operand3, 2 * p + 0, esize];
addend_i = Elem[operand3, 2 * p + 1, esize];
elt1_a = Elem[operand1, 2 * p + sel_a, esize];
elt2_a = Elem[operand2, 2 * s + sel_a, esize];
elt2_b = Elem[operand2, 2 * s + sel_b, esize];
if neg_r then elt2_a = FPNeg(elt2_a);
if neg_i then elt2_b = FPNeg(elt2_b);
addend_r = FPMulAdd(addend_r, elt1_a, elt2_a, FPCR[]);
addend_i = FPMulAdd(addend_i, elt1_a, elt2_b, FPCR[]);
Elem[result, 2 * p + 0, esize] = addend_r;
Elem[result, 2 * p + 1, esize] = addend_i;
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply the duplicated real components for rotations 0 and 180, or imaginary components for rotations 90 and 270, of
the floating-point complex numbers in the first source vector by the corresponding complex number in the second
source vector rotated by 0, 90, 180 or 270 degrees in the direction from the positive real axis towards the positive
imaginary axis, when considered in polar representation.
Then destructively add the products to the corresponding components of the complex numbers in the addend and
destination vector, without intermediate rounding.
These transformations permit the creation of a variety of multiply-add and multiply-subtract operations on complex
numbers by combining two of these instructions with the same vector operands but with rotations that are 90 degrees
apart.
Each complex number is represented in a vector register as an even/odd pair of elements with the real part in the
even-numbered element and the imaginary part in the odd-numbered element. Inactive elements in the destination
vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 0 Zm 0 rot Pg Zn Zda
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
rot <const>
00 #0
01 #90
10 #180
11 #270
CheckSVEEnabled();
integer pairs = VL DIV (2 * esize);
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = Z[da];
bits(VL) result;
for p = 0 to pairs-1
addend_r = Elem[operand3, 2 * p + 0, esize];
addend_i = Elem[operand3, 2 * p + 1, esize];
if ElemP[mask, 2 * p + 0, esize] == '1' then
bits(esize) elt1_a = Elem[operand1, 2 * p + sel_a, esize];
bits(esize) elt2_a = Elem[operand2, 2 * p + sel_a, esize];
if neg_r then elt2_a = FPNeg(elt2_a);
addend_r = FPMulAdd(addend_r, elt1_a, elt2_a, FPCR[]);
if ElemP[mask, 2 * p + 1, esize] == '1' then
bits(esize) elt1_a = Elem[operand1, 2 * p + sel_a, esize];
bits(esize) elt2_b = Elem[operand2, 2 * p + sel_b, esize];
if neg_i then elt2_b = FPNeg(elt2_b);
addend_i = FPMulAdd(addend_i, elt1_a, elt2_b, FPCR[]);
Elem[result, 2 * p + 0, esize] = addend_r;
Elem[result, 2 * p + 1, esize] = addend_i;
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare active floating-point elements in the first source vector being less than or equal to corresponding elements in
the second source vector, and place the boolean results of the comparison in the corresponding elements of the
destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the
condition flags.
• The encodings in this description are named to match the encodings of FCM<cc> (vectors).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of FCM<cc> (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 0 Pg Zn 0 Pd
cmph cmpl
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
Operation
The description of FCM<cc> (vectors) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compare active floating-point elements in the first source vector being less than corresponding elements in the second
source vector, and place the boolean results of the comparison in the corresponding elements of the destination
predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.
• The encodings in this description are named to match the encodings of FCM<cc> (vectors).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of FCM<cc> (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 0 Pg Zn 1 Pd
cmph cmpl
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
Operation
The description of FCM<cc> (vectors) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Copy a floating-point immediate into each active element in the destination vector. Inactive elements in the destination
vector register remain unmodified.
This instruction is used by the alias FMOV (immediate, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 1 1 0 imm8 Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<const> Is a floating-point immediate value expressable as ±n÷16×2^r, where n and r are integers such that 16
≤ n ≤ 31 and -3 ≤ r ≤ 4, i.e. a normalized binary floating-point encoding with 1 sign bit, 3-bit exponent,
and 4-bit fractional part, encoded in the "imm8" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = imm;
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Convert the size and precision of each active floating-point element of the source vector, and place the results in the
corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
Since the input and result types have a different size the smaller type is held unpacked in the least significant bits of
elements of the larger size. When the input is the smaller type the upper bits of each source element are ignored.
When the result is the smaller type the results are zero-extended to fill each destination element.
It has encodings from 6 classes: Half-precision to single-precision , Half-precision to double-precision , Single-
precision to half-precision , Single-precision to double-precision , Double-precision to half-precision and Double-
precision to single-precision
Half-precision to single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 1 1 0 1 Pg Zn Zd
Half-precision to double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 Pg Zn Zd
Single-precision to half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 0 1 0 1 Pg Zn Zd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 Pg Zn Zd
Double-precision to half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 0 1 0 1 Pg Zn Zd
Double-precision to single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 0 1 0 1 0 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
bits(d_esize) res = FPConvertSVE(element<s_esize-1:0>, FPCR[]);
Elem[result, e, esize] = ZeroExtend(res);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Convert to the signed integer nearer to zero from each active floating-point element of the source vector, and place the
results in the corresponding elements of the destination vector. Inactive elements in the destination vector register
remain unmodified.
If the input and result types have a different size the smaller type is held unpacked in the least significant bits of
elements of the larger size. When the input is the smaller type the upper bits of each source element are ignored.
When the result is the smaller type the results are sign-extended to fill each destination element.
It has encodings from 7 classes: Half-precision to 16-bit , Half-precision to 32-bit , Half-precision to 64-bit , Single-
precision to 32-bit , Single-precision to 64-bit , Double-precision to 32-bit and Double-precision to 64-bit
Half-precision to 16-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 Pg Zn Zd
int_U
Half-precision to 32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 0 1 Pg Zn Zd
int_U
Half-precision to 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 Pg Zn Zd
int_U
Single-precision to 32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 0 1 0 1 Pg Zn Zd
int_U
Single-precision to 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 Pg Zn Zd
int_U
Double-precision to 32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 0 0 0 1 0 1 Pg Zn Zd
int_U
Double-precision to 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 1 1 0 1 0 1 Pg Zn Zd
int_U
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
bits(d_esize) res = FPToFixed(element<s_esize-1:0>, 0, unsigned, FPCR[], rounding);
Elem[result, e, esize] = Extend(res, unsigned);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Convert to the unsigned integer nearer to zero from each active floating-point element of the source vector, and place
the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register
remain unmodified.
If the input and result types have a different size the smaller type is held unpacked in the least significant bits of
elements of the larger size. When the input is the smaller type the upper bits of each source element are ignored.
When the result is the smaller type the results are zero-extended to fill each destination element.
It has encodings from 7 classes: Half-precision to 16-bit , Half-precision to 32-bit , Half-precision to 64-bit , Single-
precision to 32-bit , Single-precision to 64-bit , Double-precision to 32-bit and Double-precision to 64-bit
Half-precision to 16-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 0 1 1 1 0 1 Pg Zn Zd
int_U
Half-precision to 32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 1 1 0 1 Pg Zn Zd
int_U
Half-precision to 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 1 0 1 Pg Zn Zd
int_U
Single-precision to 32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 1 1 0 1 Pg Zn Zd
int_U
Single-precision to 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0 1 Pg Zn Zd
int_U
Double-precision to 32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0 1 Pg Zn Zd
int_U
Double-precision to 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 1 1 1 1 0 1 Pg Zn Zd
int_U
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
bits(d_esize) res = FPToFixed(element<s_esize-1:0>, 0, unsigned, FPCR[], rounding);
Elem[result, e, esize] = Extend(res, unsigned);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Divide active floating-point elements of the first source vector by corresponding floating-point elements of the second
source vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 0 1 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPDiv(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reversed divide active floating-point elements of the second source vector by corresponding floating-point elements of
the first source vector and destructively place the quotient in the corresponding elements of the first source vector.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 0 0 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPDiv(element2, element1, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unconditionally broadcast the floating-point immediate into each element of the destination vector. This instruction is
unpredicated.
This instruction is used by the alias FMOV (immediate, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 1 1 1 0 imm8 Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<const> Is a floating-point immediate value expressable as ±n÷16×2^r, where n and r are integers such that 16
≤ n ≤ 31 and -3 ≤ r ≤ 4, i.e. a normalized binary floating-point encoding with 1 sign bit, 3-bit exponent,
and 4-bit fractional part, encoded in the "imm8" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) result;
for e = 0 to elements-1
Elem[result, e, esize] = imm;
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The FEXPA instruction accelerates the polynomial series calculation of the EXP(X) function.
The double-precision variant copies the low 52 bits of an entry from a hard-wired table of 64-bit coefficients, indexed
by the low 6 bits of each element of the source vector, and prepends to that the next 11 bits of the source element
(src<16:6>), setting the sign bit to zero.
The single-precision variant copies the low 23 bits of an entry from hard-wired table of 32-bit coefficients, indexed by
the low 6 bits of each element of the source vector, and prepends to that the next 8 bits of the source element
(src<13:6>), setting the sign bit to zero.
The half-precision variant copies the low 10 bits of an entry from hard-wired table of 16-bit coefficients, indexed by the
low 5 bits of each element of the source vector, and prepends to that the next 5 bits of the source element (src<9:5>),
setting the sign bit to zero.
A coefficient table entry with index M holds the floating-point value 2(m/64), or for the half-precision variant 2(m/32).
This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand = Z[n];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPExpA(element);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point fused multiply-add vectors (predicated), writing multiplicand [Zdn = Za + Zdn * Zm]
Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of
the third (addend) vector without intermediate rounding. Destructively place the results in the destination and first
source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Za 1 0 0 Pg Zm Zdn
N op
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Za> Is the name of the third source scalable vector register, encoded in the "Za" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = if AnyActiveElement(mask, esize) then Z[a] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
bits(esize) element3 = Elem[operand3, e, esize];
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the maximum of an immediate and each active floating-point element of the source vector, and
destructively place the results in the corresponding elements of the source vector. The immediate may take the value
+0.0 or +1.0 only. If the element value is NaN then the result is NaN. Inactive elements in the destination vector
register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 1 1 0 1 0 0 Pg 0 0 0 0 i1 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
i1 <const>
0 #0.0
1 #1.0
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPMax(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the maximum of active floating-point elements of the second source vector and corresponding floating-point
elements of the first source vector and destructively place the results in the corresponding elements of the first source
vector. If one element value is numeric and the other is a quiet NaN, then the result is the numeric value. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 0 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMax(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the maximum number value of an immediate and each active floating-point element of the source vector,
and destructively place the results in the corresponding elements of the source vector. The immediate may take the
value +0.0 or +1.0 only. If the element value is a quiet NaN, then the result is the immediate. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 1 0 0 1 0 0 Pg 0 0 0 0 i1 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
i1 <const>
0 #0.0
1 #1.0
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPMaxNum(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the maximum number value of active floating-point elements of the second source vector and
corresponding floating-point elements of the first source vector and destructively place the results in the
corresponding elements of the first source vector. If one element value is NaN then the result is the numeric value.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 0 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMaxNum(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point maximum number horizontally over all lanes of a vector using a recursive pairwise reduction, and place
the result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as the default
NaN.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 0 0 0 1 Pg Zn Vd
Assembler Symbols
size <V>
00 RESERVED
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) identity = FPDefaultNaN();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point maximum horizontally over all lanes of a vector using a recursive pairwise reduction, and place the
result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as -Infinity.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 0 0 0 1 Pg Zn Vd
Assembler Symbols
size <V>
00 RESERVED
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) identity = FPInfinity('1');
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the minimum of an immediate and each active floating-point element of the source vector, and destructively
place the results in the corresponding elements of the source vector. The immediate may take the value +0.0 or +1.0
only. If the element value is NaN then the result is NaN. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 1 1 1 1 0 0 Pg 0 0 0 0 i1 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
i1 <const>
0 #0.0
1 #1.0
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPMin(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the minimum of active floating-point elements of the second source vector and corresponding floating-point
elements of the first source vector and destructively place the results in the corresponding elements of the first source
vector. If the element value is a quiet NaN, then the result is the immediate. Inactive elements in the destination
vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 1 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMin(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the minimum number value of an immediate and each active floating-point element of the source vector,
and destructively place the results in the corresponding elements of the source vector. The immediate may take the
value +0.0 or +1.0 only. If one element value is numeric and the other is a quiet NaN, then the result is the numeric
value. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 1 0 1 1 0 0 Pg 0 0 0 0 i1 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
i1 <const>
0 #0.0
1 #1.0
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPMinNum(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the minimum number value of active floating-point elements of the second source vector and corresponding
floating-point elements of the first source vector and destructively place the results in the corresponding elements of
the first source vector. If one element value is numeric and the other is a quiet NaN, then the result is the numeric
value. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 1 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMinNum(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point minimum number horizontally over all lanes of a vector using a recursive pairwise reduction, and place
the result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as the default
NaN.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 1 0 0 1 Pg Zn Vd
Assembler Symbols
size <V>
00 RESERVED
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) identity = FPDefaultNaN();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point minimum horizontally over all lanes of a vector using a recursive pairwise reduction, and place the
result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as +Infinity.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 1 0 0 1 Pg Zn Vd
Assembler Symbols
size <V>
00 RESERVED
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
CheckSVEEnabled();
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) identity = FPInfinity('0');
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply all floating-point elements within each 128-bit segment of the first source vector by the specified element in
the corresponding second source vector segment. The products are then destructively added without intermediate
rounding to the corresponding elements of the addend and destination vector.
The elements within the second source vector are specified using an immediate index which selects the same element
position within each 128-bit vector segment. The index range is from 0 to one less than the number of elements per
128-bit segment, encoded in 1 to 3 bits depending on the size of the element. This instruction is unpredicated.
It has encodings from 3 classes: Half-precision , Single-precision and Double-precision
Half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 i3h 1 i3l Zm 0 0 0 0 0 0 Zn Zda
op
Single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 0 0 0 Zn Zda
size<1>size<0> op
Double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 0 0 0 Zn Zda
size<1>size<0> op
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> For the half-precision and single-precision variant: is the name of the second source scalable vector
register Z0-Z7, encoded in the "Zm" field.
For the double-precision variant: is the name of the second source scalable vector register Z0-Z15,
encoded in the "Zm" field.
<imm> For the half-precision variant: is the immediate index, in the range 0 to 7, encoded in the "i3h:i3l" fields.
For the single-precision variant: is the immediate index, in the range 0 to 3, encoded in the "i2" field.
For the double-precision variant: is the immediate index, in the range 0 to 1, encoded in the "i1" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result = Z[da];
for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, s, esize];
bits(esize) element3 = Elem[result, e, esize];
if op1_neg then element1 = FPNeg(element1);
if op3_neg then element3 = FPNeg(element3);
Elem[result, e, esize] = FPMulAdd(element3, element1, element2, FPCR[]);
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point fused multiply-add vectors (predicated), writing addend [Zda = Zda + Zn * Zm]
Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of
the third source (addend) vector without intermediate rounding. Destructively place the results in the destination and
third source (addend) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Zm 0 0 0 Pg Zn Zda
N op
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
bits(esize) element3 = Elem[operand3, e, esize];
Z[da] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply all floating-point elements within each 128-bit segment of the first source vector by the specified element in
the corresponding second source vector segment. The products are then destructively subtracted without intermediate
rounding from the corresponding elements of the addend and destination vector.
The elements within the second source vector are specified using an immediate index which selects the same element
position within each 128-bit vector segment. The index range is from 0 to one less than the number of elements per
128-bit segment, encoded in 1 to 3 bits depending on the size of the element. This instruction is unpredicated.
It has encodings from 3 classes: Half-precision , Single-precision and Double-precision
Half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 i3h 1 i3l Zm 0 0 0 0 0 1 Zn Zda
op
Single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 0 0 1 Zn Zda
size<1>size<0> op
Double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 0 0 1 Zn Zda
size<1>size<0> op
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> For the half-precision and single-precision variant: is the name of the second source scalable vector
register Z0-Z7, encoded in the "Zm" field.
For the double-precision variant: is the name of the second source scalable vector register Z0-Z15,
encoded in the "Zm" field.
<imm> For the half-precision variant: is the immediate index, in the range 0 to 7, encoded in the "i3h:i3l" fields.
For the single-precision variant: is the immediate index, in the range 0 to 3, encoded in the "i2" field.
For the double-precision variant: is the immediate index, in the range 0 to 1, encoded in the "i1" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result = Z[da];
for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, s, esize];
bits(esize) element3 = Elem[result, e, esize];
if op1_neg then element1 = FPNeg(element1);
if op3_neg then element3 = FPNeg(element3);
Elem[result, e, esize] = FPMulAdd(element3, element1, element2, FPCR[]);
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point fused multiply-subtract vectors (predicated), writing addend [Zda = Zda + -Zn * Zm]
Multiply the corresponding active floating-point elements of the first and second source vectors and subtract from
elements of the third source (addend) vector without intermediate rounding. Destructively place the results in the
destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Zm 0 0 1 Pg Zn Zda
N op
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
bits(esize) element3 = Elem[operand3, e, esize];
Z[da] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The floating-point matrix multiply-accumulate instruction supports single-precision and double-precision data types in
a 2×2 matrix contained in segments of 128 or 256 bits, respectively. It multiplies the 2×2 matrix in each segment of
the first source vector by the 2×2 matrix in the corresponding segment of the second source vector. The resulting 2×2
matrix product is then destructively added to the matrix accumulator held in the corresponding segment of the addend
and destination vector. This is equivalent to performing a 2-way dot product per destination element. This instruction
is unpredicated. The single-precision variant is vector length agnostic. The double-precision variant requires that the
current vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the
trailing bits are set to zero.
ID_AA64ZFR0_EL1.F32MM indicates whether the single-precision variant is implemented.
ID_AA64ZFR0_EL1.F64MM indicates whether the double-precision variant is implemented.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
(FEAT_F32MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 Zm 1 1 1 0 0 1 Zn Zda
64-bit element
(FEAT_F64MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 Zm 1 1 1 0 0 1 Zn Zda
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
CheckSVEEnabled();
if VL < esize * 4 then UNDEFINED;
integer segments = VL DIV (4 * esize);
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result = Zeros();
bits(4*esize) op1, op2;
bits(4*esize) res, addend;
for s = 0 to segments-1
op1 = Elem[operand1, s, 4*esize];
op2 = Elem[operand2, s, 4*esize];
addend = Elem[operand3, s, 4*esize];
res = FPMatMulAdd(addend, op1, op2, esize, FPCR[]);
Elem[result, s, 4*esize] = res;
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move a floating-point immediate into each active element in the destination vector. Inactive elements in the
destination vector register remain unmodified.
• The encodings in this description are named to match the encodings of FCPY.
• The description of FCPY gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 1 1 0 imm8 Zd
is equivalent to
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<const> Is a floating-point immediate value expressable as ±n÷16×2^r, where n and r are integers such that 16
≤ n ≤ 31 and -3 ≤ r ≤ 4, i.e. a normalized binary floating-point encoding with 1 sign bit, 3-bit exponent,
and 4-bit fractional part, encoded in the "imm8" field.
Operation
The description of FCPY gives the operational pseudocode for this instruction.
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
FMOV (immediate,
Page 1900
predicated)
FMOV (immediate, unpredicated)
Unconditionally broadcast the floating-point immediate into each element of the destination vector. This instruction is
unpredicated.
• The encodings in this description are named to match the encodings of FDUP.
• The description of FDUP gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 1 1 1 0 imm8 Zd
is equivalent to
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<const> Is a floating-point immediate value expressable as ±n÷16×2^r, where n and r are integers such that 16
≤ n ≤ 31 and -3 ≤ r ≤ 4, i.e. a normalized binary floating-point encoding with 1 sign bit, 3-bit exponent,
and 4-bit fractional part, encoded in the "imm8" field.
Operation
The description of FDUP gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
FMOV (immediate,
Page 1901
unpredicated)
FMOV (zero, predicated)
Move floating-point constant +0.0 to to each active element in the destination vector. Inactive elements in the
destination vector register remain unmodified.
• The encodings in this description are named to match the encodings of CPY (immediate, merging).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of CPY (immediate, merging) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 1 0 0 0 0 0 0 0 0 0 Zd
M sh imm8
is equivalent to
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
Operation
The description of CPY (immediate, merging) gives the operational pseudocode for this instruction.
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unconditionally broadcast the floating-point constant +0.0 into each element of the destination vector. This instruction
is unpredicated.
• The encodings in this description are named to match the encodings of DUP (immediate).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of DUP (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 Zd
sh imm8
is equivalent to
DUP <Zd>.<T>, #0
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
Operation
The description of DUP (immediate) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point fused multiply-subtract vectors (predicated), writing multiplicand [Zdn = Za + -Zdn * Zm]
Multiply the corresponding active floating-point elements of the first and second source vectors and subtract from
elements of the third (addend) vector without intermediate rounding. Destructively place the results in the destination
and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Za 1 0 1 Pg Zm Zdn
N op
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Za> Is the name of the third source scalable vector register, encoded in the "Za" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = if AnyActiveElement(mask, esize) then Z[a] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
bits(esize) element3 = Elem[operand3, e, esize];
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply by an immediate each active floating-point element of the source vector, and destructively place the results in
the corresponding elements of the source vector. The immediate may take the value +0.5 or +2.0 only. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 1 0 1 0 0 Pg 0 0 0 0 i1 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
i1 <const>
0 #0.5
1 #2.0
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPMul(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply all floating-point elements within each 128-bit segment of the first source vector by the specified element in
the corresponding second source vector segment. The results are placed in the corresponding elements of the
destination vector.
The elements within the second source vector are specified using an immediate index which selects the same element
position within each 128-bit vector segment. The index range is from 0 to one less than the number of elements per
128-bit segment, encoded in 1 to 3 bits depending on the size of the element. This instruction is unpredicated.
It has encodings from 3 classes: Half-precision , Single-precision and Double-precision
Half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 i3h 1 i3l Zm 0 0 1 0 0 0 Zn Zd
Single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 i2 Zm 0 0 1 0 0 0 Zn Zd
size<1>size<0>
Double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i1 Zm 0 0 1 0 0 0 Zn Zd
size<1>size<0>
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, s, esize];
Elem[result, e, esize] = FPMul(element1, element2, FPCR[]);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply active floating-point elements of the first source vector by corresponding floating-point elements of the
second source vector and destructively place the results in the corresponding elements of the first source vector.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 1 0 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMul(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply all elements of the first source vector by corresponding floating-point elements of the second source vector
and place the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 0 1 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMul(element1, element2, FPCR[]);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply active floating-point elements of the first source vector by corresponding floating-point elements of the
second source vector except that ∞×0.0 gives 2.0 instead of NaN, and destructively place the results in the
corresponding elements of the first source vector. Inactive elements in the destination vector register remain
unmodified.
The instruction can be used with FRECPX to safely convert arbitrary elements in mathematical vector space to UNIT
VECTORS or DIRECTION VECTORS with length 1.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 0 1 0 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMulX(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Negate each active floating-point element of the source vector, and place the results in the corresponding elements of
the destination vector. This inverts the sign bit and cannot signal a floating-point exception. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 1 0 1 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPNeg(element);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point negated fused multiply-add vectors (predicated), writing multiplicand [Zdn = -Za + -Zdn * Zm]
Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of
the third (addend) vector without intermediate rounding. Destructively place the negated results in the destination
and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Za 1 1 0 Pg Zm Zdn
N op
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Za> Is the name of the third source scalable vector register, encoded in the "Za" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = if AnyActiveElement(mask, esize) then Z[a] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
bits(esize) element3 = Elem[operand3, e, esize];
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point negated fused multiply-add vectors (predicated), writing addend [Zda = -Zda + -Zn * Zm]
Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of
the third source (addend) vector without intermediate rounding. Destructively place the negated results in the
destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Zm 0 1 0 Pg Zn Zda
N op
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
bits(esize) element3 = Elem[operand3, e, esize];
Z[da] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point negated fused multiply-subtract vectors (predicated), writing addend [Zda = -Zda + Zn * Zm]
Multiply the corresponding active floating-point elements of the first and second source vectors and subtract from
elements of the third source (addend) vector without intermediate rounding. Destructively place the negated results in
the destination and third source (addend) vector. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Zm 0 1 1 Pg Zn Zda
N op
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
bits(esize) element3 = Elem[operand3, e, esize];
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Floating-point negated fused multiply-subtract vectors (predicated), writing multiplicand [Zdn = -Za + Zdn * Zm]
Multiply the corresponding active floating-point elements of the first and second source vectors and subtract from
elements of the third (addend) vector without intermediate rounding. Destructively place the negated results in the
destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Za 1 1 1 Pg Zm Zdn
N op
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Za> Is the name of the third source scalable vector register, encoded in the "Za" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = if AnyActiveElement(mask, esize) then Z[a] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
bits(esize) element3 = Elem[operand3, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Find the approximate reciprocal of each floating-point element of the source vector, and place the results in the
corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 1 0 0 0 1 1 0 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand = Z[n];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRecipEstimate(element, FPCR[]);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply corresponding floating-point elements of the first and second source vectors, subtract the products from 2.0
without intermediate rounding and place the results in the corresponding elements of the destination vector. This
instruction is unpredicated.
This instruction can be used to perform a single Newton-Raphson iteration for calculating the reciprocal of a vector of
floating-point values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 1 1 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPRecipStepFused(element1, element2);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Invert the exponent and zero the fractional part of each active floating-point element of the source vector, and place
the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register
remain unmodified.
The result of this instruction can be used with FMULX to convert arbitrary elements in mathematical vector space to
"unit vectors" or "direction vectors" of length 1.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 0 0 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRecpX(element, FPCR[]);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Round to an integral floating-point value with the specified rounding option from each active floating-point element of
the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in
the destination vector register remain unmodified.
The <r> symbol specifies one of the following rounding options: N (to nearest, with ties to even), A (to nearest, with
ties away from zero), M (toward minus Infinity), P (toward plus Infinity), Z (toward zero), I (current FPCR rounding
mode), or X (current FPCR rounding mode, signalling inexact).
It has encodings from 7 classes: Current mode , Current mode signalling inexact , Nearest with ties to away , Nearest
with ties to even , Toward zero , Toward minus infinity and Toward plus infinity
Current mode
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 1 1 0 1 Pg Zn Zd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 0 1 0 1 Pg Zn Zd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 0 1 0 1 Pg Zn Zd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 0 1 0 1 Pg Zn Zd
Toward zero
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 1 1 1 0 1 Pg Zn Zd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 1 0 1 0 1 Pg Zn Zd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 1 1 0 1 Pg Zn Zd
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Find the approximate reciprocal square root of each active floating-point element of the source vector, and place the
results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 1 1 0 0 1 1 0 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand = Z[n];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRSqrtEstimate(element, FPCR[]);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply corresponding floating-point elements of the first and second source vectors, subtract the products from 3.0
and divide the results by 2.0 without any intermediate rounding and place the results in the corresponding elements of
the destination vector. This instruction is unpredicated.
This instruction can be used to perform a single Newton-Raphson iteration for calculating the reciprocal square root of
a vector of floating-point values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 1 1 1 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPRSqrtStepFused(element1, element2);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply the active floating-point elements of the first source vector by 2.0 to the power of the signed integer values in
the corresponding elements of the second source vector and destructively place the results in the corresponding
elements of the first source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 0 0 1 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
integer element2 = SInt(Elem[operand2, e, esize]);
Elem[result, e, esize] = FPScale(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Calculate the square root of each active floating-point element of the source vector, and place the results in the
corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 0 1 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPSqrt(element, FPCR[]);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract an immediate from each active floating-point element of the source vector, and destructively place the results
in the corresponding elements of the source vector. The immediate may take the value +0.5 or +1.0 only. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 0 1 1 0 0 Pg 0 0 0 0 i1 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
i1 <const>
0 #0.5
1 #1.0
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPSub(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract active floating-point elements of the second source vector from corresponding floating-point elements of the
first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 1 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPSub(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract all floating-point elements of the second source vector from corresponding elements of the first source vector
and place the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 0 0 1 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPSub(element1, element2, FPCR[]);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reversed subtract from an immediate each active floating-point element of the source vector, and destructively place
the results in the corresponding elements of the source vector. The immediate may take the value +0.5 or +1.0 only.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 1 1 1 0 0 Pg 0 0 0 0 i1 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
i1 <const>
0 #0.5
1 #1.0
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPSub(imm, element1, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reversed subtract active floating-point elements of the first source vector from corresponding floating-point elements
of the second source vector and destructively place the results in the corresponding elements of the first source
vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 1 1 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPSub(element2, element1, FPCR[]);
else
Elem[result, e, esize] = element1;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The FTMAD instruction calculates the series terms for either SIN(X) or COS(X), where the argument X has been adjusted
to be in the range -π/4 < X ≤ π/4.
To calculate the series terms of SIN(X) and COS(X) the initial source operands of FTMAD should be zero in the first source
vector and X2 in the second source vector. The FTMAD instruction is then executed eight times to calculate the sum of
eight series terms, which gives a result of sufficient precision.
The FTMAD instruction multiplies each element of the first source vector by the absolute value of the corresponding
element of the second source vector and performs a fused addition of each product with a value obtained from a table
of hard-wired coefficients, and places the results destructively in the first source vector.
The coefficients are different for SIN(X) and COS(X), and are selected by a combination of the sign bit in the second
source element and an immediate index in the range 0 to 7.
This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 imm3 1 0 0 0 0 0 Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<imm> Is the unsigned immediate operand, in the range 0 to 7, encoded in the "imm3" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPTrigMAdd(imm, element1, element2, FPCR[]);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The FTSMUL instruction calculates the initial value for the FTMAD instruction. The instruction squares each element in
the first source vector and then sets the sign bit to a copy of bit 0 of the corresponding element in the second source
register, and places the results in the destination vector. This instruction is unpredicated.
To compute SIN(X) or COS(X) the instruction is executed with elements of the first source vector set to X, adjusted to be
in the range -π/4 < X ≤ π/4.
The elements of the second source vector hold the corresponding value of the quadrant Q number as an integer not a
floating-point value. The value Q satisfies the relationship (2q-1) × π/4 < X ≤ (2q+1) × π/4.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 0 1 1 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPTrigSMul(element1, element2, FPCR[]);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The FTSSEL instruction selects the coefficient for the final multiplication in the polynomial series approximation. The
instruction places the value 1.0 or a copy of the first source vector element in the destination element, depending on
bit 0 of the quadrant number Q held in the corresponding element of the second source vector. The sign bit of the
destination element is copied from bit 1 of the corresponding value of Q. This instruction is unpredicated.
To compute SIN(X) or COS(X) the instruction is executed with elements of the first source vector set to X, adjusted to be
in the range -π/4 < X ≤ π/4.
The elements of the second source vector hold the corresponding value of the quadrant Q number as an integer not a
floating-point value. The value Q satisfies the relationship (2q-1) × π/4 < X ≤ (2q+1) × π/4.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 1 1 0 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPTrigSSel(element1, element2);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate
in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 4 classes: Byte , Doubleword , Halfword and Word
Byte
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 0 0 0 pattern Rdn
size<1>size<0> D
Doubleword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 0 0 0 pattern Rdn
size<1>size<0> D
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 0 0 0 pattern Rdn
size<1>size<0> D
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 0 0 0 pattern Rdn
size<1>size<0> D
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(64) operand1 = X[dn];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate
in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 3 classes: Doubleword , Halfword and Word
Doubleword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D
Word
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
Elem[result, e, esize] = Elem[operand1, e, esize] + (count * imm);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Counts the number of true elements in the source predicate and then uses the result to increment the scalar
destination.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 0 0 1 0 0 0 1 0 0 Pm Rdn
D
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) operand1 = X[dn];
bits(PL) operand2 = P[m];
integer count = 0;
for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Counts the number of true elements in the source predicate and then uses the result to increment all destination
vector elements.
The predicate size specifier may be omitted in assembler source code, but this is deprecated and will be prohibited in
a future release of the architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 0 0 1 0 0 0 0 0 0 Pm Zdn
D
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) operand2 = P[m];
bits(VL) result;
integer count = 0;
for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;
for e = 0 to elements-1
Elem[result, e, esize] = Elem[operand1, e, esize] + count;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Populates the destination vector by setting the first element to the first signed immediate integer operand and
monotonically incrementing the value by the second signed scalar integer operand for each subsequent element. The
scalar source operand is a general-purpose register in which only the least significant bits corresponding to the vector
element size are used and any remaining bits are ignored. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Rm 0 1 0 0 1 0 imm5 Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is the signed immediate operand, in the range -16 to 15, encoded in the "imm5" field.
size <R>
01 W
x0 W
11 X
<m> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rm"
field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(esize) operand2 = X[m];
integer element2 = SInt(operand2);
bits(VL) result;
for e = 0 to elements-1
integer index = imm + e * element2;
Elem[result, e, esize] = index<esize-1:0>;
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Populates the destination vector by setting the first element to the first signed immediate integer operand and
monotonically incrementing the value by the second signed immediate integer operand for each subsequent element.
This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 imm5b 0 1 0 0 0 0 imm5 Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<imm1> Is the first signed immediate operand, in the range -16 to 15, encoded in the "imm5" field.
<imm2> Is the second signed immediate operand, in the range -16 to 15, encoded in the "imm5b" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) result;
for e = 0 to elements-1
integer index = imm1 + e * imm2;
Elem[result, e, esize] = index<esize-1:0>;
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Populates the destination vector by setting the first element to the first signed scalar integer operand and
monotonically incrementing the value by the second signed immediate integer operand for each subsequent element.
The scalar source operand is a general-purpose register in which only the least significant bits corresponding to the
vector element size are used and any remaining bits are ignored. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 imm5 0 1 0 0 0 1 Rn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
size <R>
01 W
x0 W
11 X
<n> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rn"
field.
<imm> Is the signed immediate operand, in the range -16 to 15, encoded in the "imm5" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(esize) operand1 = X[n];
integer element1 = SInt(operand1);
bits(VL) result;
for e = 0 to elements-1
integer index = element1 + e * imm;
Elem[result, e, esize] = index<esize-1:0>;
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Populates the destination vector by setting the first element to the first signed scalar integer operand and
monotonically incrementing the value by the second signed scalar integer operand for each subsequent element. The
scalar source operands are general-purpose registers in which only the least significant bits corresponding to the
vector element size are used and any remaining bits are ignored. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Rm 0 1 0 0 1 1 Rn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
size <R>
01 W
x0 W
11 X
<n> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rn"
field.
<m> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rm"
field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(esize) operand1 = X[n];
integer element1 = SInt(operand1);
bits(esize) operand2 = X[m];
integer element2 = SInt(operand2);
bits(VL) result;
for e = 0 to elements-1
integer index = element1 + e * element2;
Elem[result, e, esize] = index<esize-1:0>;
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift the destination vector left by one element, and then place a copy of the least-significant bits of the general-
purpose register in element 0 of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 0 0 0 0 1 1 1 0 Rm Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
size <R>
01 W
x0 W
11 X
<m> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rm"
field.
Operation
CheckSVEEnabled();
bits(VL) dest = Z[dn];
bits(esize) src = X[m];
Z[dn] = dest<(VL-esize)-1:0> : src;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift the destination vector left by one element, and then place a copy of the SIMD&FP scalar register in element 0 of
the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 1 0 0 0 0 1 1 1 0 Vm Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
size <V>
00 B
01 H
10 S
11 D
<m> Is the number [0-31] of the source SIMD&FP register, encoded in the "Vm" field.
Operation
CheckSVEEnabled();
bits(VL) dest = Z[dn];
bits(esize) src = V[m];
Z[dn] = dest<(VL-esize)-1:0> : src;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
If there is an active element then extract the element after the last active element modulo the number of elements
from the final source vector register. If there are no active elements, extract element zero. Then zero-extend and place
the extracted element in the destination general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 1 0 1 Pg Zn Rd
B
Assembler Symbols
size <R>
01 W
x0 W
11 X
<d> Is the number [0-30] of the destination general-purpose register or the name ZR (31), encoded in the
"Rd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = Z[n];
bits(rsize) result;
integer last = LastActiveElement(mask, esize);
if isBefore then
if last < 0 then last = elements - 1;
else
last = last + 1;
if last >= elements then last = 0;
result = ZeroExtend(Elem[operand, last, esize]);
X[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
If there is an active element then extract the element after the last active element modulo the number of elements
from the final source vector register. If there are no active elements, extract element zero. Then place the extracted
element in the destination SIMD&FP scalar register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 1 0 1 0 0 Pg Zn Vd
B
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = Z[n];
integer last = LastActiveElement(mask, esize);
if isBefore then
if last < 0 then last = elements - 1;
else
last = last + 1;
if last >= elements then last = 0;
V[d] = Elem[operand, last, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
If there is an active element then extract the last active element from the final source vector register. If there are no
active elements, extract the highest-numbered element. Then zero-extend and place the extracted element in the
destination general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 1 1 0 1 Pg Zn Rd
B
Assembler Symbols
size <R>
01 W
x0 W
11 X
<d> Is the number [0-30] of the destination general-purpose register or the name ZR (31), encoded in the
"Rd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = Z[n];
bits(rsize) result;
integer last = LastActiveElement(mask, esize);
if isBefore then
if last < 0 then last = elements - 1;
else
last = last + 1;
if last >= elements then last = 0;
result = ZeroExtend(Elem[operand, last, esize]);
X[d] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
If there is an active element then extract the last active element from the final source vector register. If there are no
active elements, extract the highest-numbered element. Then place the extracted element in the destination SIMD&FP
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 1 1 1 0 0 Pg Zn Vd
B
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = Z[n];
integer last = LastActiveElement(mask, esize);
if isBefore then
if last < 0 then last = elements - 1;
else
last = last + 1;
if last >= elements then last = 0;
V[d] = Elem[operand, last, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of unsigned bytes to elements of a vector register from the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
It has encodings from 4 classes: 8-bit element , 16-bit element , 32-bit element and 64-bit element
8-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of unsigned bytes to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is added to the base address. After each element access the index value is
incremented, but the index register is not updated. Inactive elements will not not cause a read from Device memory or
signal a fault, and are set to zero in the destination vector.
It has encodings from 4 classes: 8-bit element , 16-bit element , 32-bit element and 64-bit element
8-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of unsigned bytes to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally sign or zero-extended from 32 to 64 bits. Inactive
elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 3 classes: 32-bit unpacked unscaled offset , 32-bit unscaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 0 Zm 0 1 0 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 0 Zm 0 1 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 0 Zm 1 1 0 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of unsigned bytes to active elements of a vector register from memory addresses generated by a vector
base plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a read from Device
memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 31, defaulting to 0, encoded in the
"imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of doublewords to elements of a vector register from the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of doublewords to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is multiplied by 8 and added to the base address. After each element access the
index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of doublewords to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then
optionally multiplied by 8. Inactive elements will not cause a read from Device memory or signal faults, and are set to
zero in the destination vector.
It has encodings from 4 classes: 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset , 64-bit scaled offset
and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 xs 1 Zm 0 1 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 xs 0 Zm 0 1 0 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 1 1 Zm 1 1 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 1 0 Zm 1 1 0 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of doublewords to active elements of a vector register from memory addresses generated by a vector base
plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive elements will not cause a read from
Device memory or signal faults, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 8 in the range 0 to 248, defaulting to 0,
encoded in the "imm5" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of unsigned halfwords to elements of a vector register from the memory address generated by a
64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements will not not cause a read from Device
memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of unsigned halfwords to elements of a vector register from the memory address generated by a
64-bit scalar base and scalar index which is multiplied by 2 and added to the base address. After each element access
the index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of unsigned halfwords to active elements of a vector register from memory addresses generated by a
64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and
then optionally multiplied by 2. Inactive elements will not cause a read from Device memory or signal faults, and are
set to zero in the destination vector.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 1 Zm 0 1 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 1 Zm 0 1 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 0 Zm 0 1 0 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 0 Zm 0 1 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 1 Zm 1 1 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 0 Zm 1 1 0 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of unsigned halfwords to active elements of a vector register from memory addresses generated by a
vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not cause a
read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 62, defaulting to 0,
encoded in the "imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load a single unsigned byte from a memory address generated by a 64-bit scalar base address plus an immediate
offset which is in the range 0 to 63.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
It has encodings from 4 classes: 8-bit element , 16-bit element , 32-bit element and 64-bit element
8-bit element
16-bit element
32-bit element
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 63, defaulting to 0, encoded in the
"imm6" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load a single doubleword from a memory address generated by a 64-bit scalar base address plus an immediate offset
which is a multiple of 8 in the range 0 to 504.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 1 1 1 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 8 in the range 0 to 504, defaulting to 0,
encoded in the "imm6" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Load a single unsigned halfword from a memory address generated by a 64-bit scalar base address plus an immediate
offset which is a multiple of 2 in the range 0 to 126.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element
16-bit element
32-bit element
64-bit element
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 126, defaulting to 0,
encoded in the "imm6" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load thirty-two contiguous bytes to elements of a 256-bit (octaword) vector from the memory address generated by a
64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base
address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first thirty-two predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.
SVE
(FEAT_F64MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 32 in the range -256 to 224, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low bits only
bits(256) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load thirty-two contiguous bytes to elements of a 256-bit (octaword) vector from the memory address generated by a
64-bit scalar base address and scalar index which is added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first thirty-two predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.
SVE
(FEAT_F64MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low bits only
bits(64) offset;
bits(256) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load four contiguous doublewords to elements of a 256-bit (octaword) vector from the memory address generated by
a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base
address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first four predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.
SVE
(FEAT_F64MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 32 in the range -256 to 224, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low bits only
bits(256) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load four contiguous doublewords to elements of a 256-bit (octaword) vector from the memory address generated by
a 64-bit scalar base address and scalar index which is multiplied by 8 and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first four predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.
SVE
(FEAT_F64MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low bits only
bits(64) offset;
bits(256) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load sixteen contiguous halfwords to elements of a 256-bit (octaword) vector from the memory address generated by
a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base
address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first sixteen predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.
SVE
(FEAT_F64MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 32 in the range -256 to 224, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low bits only
bits(256) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load sixteen contiguous halfwords to elements of a 256-bit (octaword) vector from the memory address generated by
a 64-bit scalar base address and scalar index which is multiplied by 2 and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first sixteen predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.
SVE
(FEAT_F64MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low bits only
bits(64) offset;
bits(256) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load eight contiguous words to elements of a 256-bit (octaword) vector from the memory address generated by a
64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base
address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first eight predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.
SVE
(FEAT_F64MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 32 in the range -256 to 224, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low bits only
bits(256) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load eight contiguous words to elements of a 256-bit (octaword) vector from the memory address generated by a
64-bit scalar base address and scalar index which is multiplied by 4 and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first eight predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.
SVE
(FEAT_F64MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low bits only
bits(64) offset;
bits(256) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load sixteen contiguous bytes to elements of a short, 128-bit (quadword) vector from the memory address generated
by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the
base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first sixteen predicate elements are used and
higher numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 16 in the range -128 to 112, defaulting to 0,
encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(128) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (offset * 16) + (e * mbytes);
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load sixteen contiguous bytes to elements of a short, 128-bit (quadword) vector from the memory address generated
by a 64-bit scalar base address and scalar index which is added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first sixteen predicate elements are used and
higher numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(64) offset;
bits(128) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Load two contiguous doublewords to elements of a short, 128-bit (quadword) vector from the memory address
generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112
added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first two predicate elements are used and higher
numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 16 in the range -128 to 112, defaulting to 0,
encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(128) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (offset * 16) + (e * mbytes);
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load two contiguous doublewords to elements of a short, 128-bit (quadword) vector from the memory address
generated by a 64-bit scalar base address and scalar index which is multiplied by 8 and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first two predicate elements are used and higher
numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(64) offset;
bits(128) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Load eight contiguous halfwords to elements of a short, 128-bit (quadword) vector from the memory address
generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112
added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first eight predicate elements are used and higher
numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 16 in the range -128 to 112, defaulting to 0,
encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(128) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (offset * 16) + (e * mbytes);
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load eight contiguous halfwords to elements of a short, 128-bit (quadword) vector from the memory address
generated by a 64-bit scalar base address and scalar index which is multiplied by 2 and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first eight predicate elements are used and higher
numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(64) offset;
bits(128) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Load four contiguous words to elements of a short, 128-bit (quadword) vector from the memory address generated by
a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base
address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first four predicate elements are used and higher
numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 16 in the range -128 to 112, defaulting to 0,
encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(128) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (offset * 16) + (e * mbytes);
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load four contiguous words to elements of a short, 128-bit (quadword) vector from the memory address generated by
a 64-bit scalar base address and scalar index which is multiplied by 4 and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first four predicate elements are used and higher
numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(64) offset;
bits(128) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Load a single signed byte from a memory address generated by a 64-bit scalar base address plus an immediate offset
which is in the range 0 to 63.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element
16-bit element
32-bit element
64-bit element
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 63, defaulting to 0, encoded in the
"imm6" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load a single signed halfword from a memory address generated by a 64-bit scalar base address plus an immediate
offset which is a multiple of 2 in the range 0 to 126.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
64-bit element
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 126, defaulting to 0,
encoded in the "imm6" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load a single signed word from a memory address generated by a 64-bit scalar base address plus an immediate offset
which is a multiple of 4 in the range 0 to 252.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 1 imm6 1 0 0 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 252, defaulting to 0,
encoded in the "imm6" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Load a single unsigned word from a memory address generated by a 64-bit scalar base address plus an immediate
offset which is a multiple of 4 in the range 0 to 252.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
64-bit element
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 252, defaulting to 0,
encoded in the "imm6" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of signed bytes to elements of a vector register from the memory address generated by a 64-bit scalar
base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of signed bytes to elements of a vector register from the memory address generated by a 64-bit scalar
base and scalar index which is added to the base address. After each element access the index value is incremented,
but the index register is not updated. Inactive elements will not not cause a read from Device memory or signal a fault,
and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of signed bytes to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally sign or zero-extended from 32 to 64 bits. Inactive
elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 3 classes: 32-bit unpacked unscaled offset , 32-bit unscaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 0 Zm 0 0 0 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 0 Zm 0 0 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 0 Zm 1 0 0 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of signed bytes to active elements of a vector register from memory addresses generated by a vector base
plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a read from Device memory
or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 31, defaulting to 0, encoded in the
"imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of signed halfwords to elements of a vector register from the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of signed halfwords to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is multiplied by 2 and added to the base address. After each element access the
index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of signed halfwords to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then
optionally multiplied by 2. Inactive elements will not cause a read from Device memory or signal faults, and are set to
zero in the destination vector.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 1 Zm 0 0 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 1 Zm 0 0 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 0 Zm 0 0 0 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 0 Zm 0 0 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 1 Zm 1 0 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 0 Zm 1 0 0 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of signed halfwords to active elements of a vector register from memory addresses generated by a vector
base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not cause a read
from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 62, defaulting to 0,
encoded in the "imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of signed words to elements of a vector register from the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of signed words to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is multiplied by 4 and added to the base address. After each element access the
index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of signed words to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then
optionally multiplied by 4. Inactive elements will not cause a read from Device memory or signal faults, and are set to
zero in the destination vector.
It has encodings from 4 classes: 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset , 64-bit scaled offset
and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 1 Zm 0 0 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 0 Zm 0 0 0 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 1 Zm 1 0 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 0 Zm 1 0 0 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of signed words to active elements of a vector register from memory addresses generated by a vector base
plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not cause a read from
Device memory or signal faults, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 124, defaulting to 0,
encoded in the "imm5" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of unsigned words to elements of a vector register from the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load of unsigned words to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is multiplied by 4 and added to the base address. After each element access the
index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of unsigned words to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then
optionally multiplied by 4. Inactive elements will not cause a read from Device memory or signal faults, and are set to
zero in the destination vector.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 xs 1 Zm 0 1 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 1 Zm 0 1 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 0 Zm 0 1 0 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 xs 0 Zm 0 1 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 1 Zm 1 1 0 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 0 Zm 1 1 0 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load of unsigned words to active elements of a vector register from memory addresses generated by a vector
base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not cause a read
from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 124, defaulting to 0,
encoded in the "imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load two-byte structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive bytes in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 2 in the range -16 to 14, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load two-byte structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register and added to the base address. After each
structure access the index value is incremented by two. The index register is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive bytes in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load two-doubleword structures, each to the same element number in two vector registers from the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to
14 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive doublewords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 2 in the range -16 to 14, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load two-doubleword structures, each to the same element number in two vector registers from the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by two. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive doublewords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load two-halfword structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive halfwords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 2 in the range -16 to 14, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load two-halfword structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by two. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive halfwords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load two-word structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive words in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 2 in the range -16 to 14, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load two-word structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by two. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive words in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load three-byte structures, each to the same element number in three vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive bytes in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load three-byte structures, each to the same element number in three vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register and added to the base address. After each
structure access the index value is incremented by three. The index register is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive bytes in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load three-doubleword structures, each to the same element number in three vector registers from the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to
21 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive doublewords in memory which make up each structure. Inactive elements will not cause a read
from Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination
vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load three-doubleword structures, each to the same element number in three vector registers from the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by three. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive doublewords in memory which make up each structure. Inactive elements will not cause a read
from Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination
vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load three-halfword structures, each to the same element number in three vector registers from the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to
21 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive halfwords in memory which make up each structure. Inactive elements will not cause a read
from Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination
vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load three-halfword structures, each to the same element number in three vector registers from the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by three. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive halfwords in memory which make up each structure. Inactive elements will not cause a read
from Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination
vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load three-word structures, each to the same element number in three vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive words in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load three-word structures, each to the same element number in three vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by three. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive words in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load four-byte structures, each to the same element number in four vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive bytes in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
LD4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load four-byte structures, each to the same element number in four vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register and added to the base address. After each
structure access the index value is incremented by four. The index register is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive bytes in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load four-doubleword structures, each to the same element number in four vector registers from the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to
28 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive doublewords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
LD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load four-doubleword structures, each to the same element number in four vector registers from the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by four. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive doublewords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
LD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load four-halfword structures, each to the same element number in four vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive halfwords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
LD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load four-halfword structures, each to the same element number in four vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by four. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive halfwords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
LD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load four-word structures, each to the same element number in four vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive words in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
LD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load four-word structures, each to the same element number in four vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by four. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive words in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
LD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();
for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with first-faulting behavior of unsigned bytes to elements of a vector register from the memory
address generated by a 64-bit scalar base and scalar index which is added to the base address. After each element
access the index value is incremented, but the index register is not updated. Inactive elements will not not cause a
read from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 4 classes: 8-bit element , 16-bit element , 32-bit element and 64-bit element
8-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_CNOTFIRST];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of unsigned bytes to active elements of a vector register from memory
addresses generated by a 64-bit scalar base plus vector index. The index values are optionally sign or zero-extended
from 32 to 64 bits. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in
the destination vector.
It has encodings from 3 classes: 32-bit unpacked unscaled offset , 32-bit unscaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 0 Zm 0 1 1 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 0 Zm 0 1 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 0 Zm 1 1 1 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of unsigned bytes to active elements of a vector register from memory
addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will
not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 31, defaulting to 0, encoded in the
"imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with first-faulting behavior of doublewords to elements of a vector register from the memory address
generated by a 64-bit scalar base and scalar index which is multiplied by 8 and added to the base address. After each
element access the index value is incremented, but the index register is not updated. Inactive elements will not not
cause a read from Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_CNOTFIRST];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of doublewords to active elements of a vector register from memory addresses
generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32
to 64 bits and then optionally multiplied by 8. Inactive elements will not cause a read from Device memory or signal
faults, and are set to zero in the destination vector.
It has encodings from 4 classes: 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset , 64-bit scaled offset
and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 xs 1 Zm 0 1 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 xs 0 Zm 0 1 1 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 1 1 Zm 1 1 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 1 0 Zm 1 1 1 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of doublewords to active elements of a vector register from memory addresses
generated by a vector base plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive elements
will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 8 in the range 0 to 248, defaulting to 0,
encoded in the "imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with first-faulting behavior of unsigned halfwords to elements of a vector register from the memory
address generated by a 64-bit scalar base and scalar index which is multiplied by 2 and added to the base address.
After each element access the index value is incremented, but the index register is not updated. Inactive elements will
not not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_CNOTFIRST];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of unsigned halfwords to active elements of a vector register from memory
addresses generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-
extended from 32 to 64 bits and then optionally multiplied by 2. Inactive elements will not cause a read from Device
memory or signal faults, and are set to zero in the destination vector.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 1 Zm 0 1 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 1 Zm 0 1 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 0 Zm 0 1 1 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 0 Zm 0 1 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 1 Zm 1 1 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 0 Zm 1 1 1 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of unsigned halfwords to active elements of a vector register from memory
addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive
elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 62, defaulting to 0,
encoded in the "imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with first-faulting behavior of signed bytes to elements of a vector register from the memory address
generated by a 64-bit scalar base and scalar index which is added to the base address. After each element access the
index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_CNOTFIRST];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of signed bytes to active elements of a vector register from memory addresses
generated by a 64-bit scalar base plus vector index. The index values are optionally sign or zero-extended from 32 to
64 bits. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the
destination vector.
It has encodings from 3 classes: 32-bit unpacked unscaled offset , 32-bit unscaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 0 Zm 0 0 1 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 0 Zm 0 0 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of signed bytes to active elements of a vector register from memory addresses
generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a
read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 31, defaulting to 0, encoded in the
"imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with first-faulting behavior of signed halfwords to elements of a vector register from the memory
address generated by a 64-bit scalar base and scalar index which is multiplied by 2 and added to the base address.
After each element access the index value is incremented, but the index register is not updated. Inactive elements will
not not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_CNOTFIRST];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of signed halfwords to active elements of a vector register from memory
addresses generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-
extended from 32 to 64 bits and then optionally multiplied by 2. Inactive elements will not cause a read from Device
memory or signal faults, and are set to zero in the destination vector.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 1 Zm 0 0 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 1 Zm 0 0 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 0 Zm 0 0 1 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 0 Zm 0 0 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 1 Zm 1 0 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of signed halfwords to active elements of a vector register from memory
addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive
elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 62, defaulting to 0,
encoded in the "imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with first-faulting behavior of signed words to elements of a vector register from the memory address
generated by a 64-bit scalar base and scalar index which is multiplied by 4 and added to the base address. After each
element access the index value is incremented, but the index register is not updated. Inactive elements will not not
cause a read from Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_CNOTFIRST];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of signed words to active elements of a vector register from memory addresses
generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32
to 64 bits and then optionally multiplied by 4. Inactive elements will not cause a read from Device memory or signal
faults, and are set to zero in the destination vector.
It has encodings from 4 classes: 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset , 64-bit scaled offset
and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 1 Zm 0 0 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 0 Zm 0 0 1 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 1 Zm 1 0 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of signed words to active elements of a vector register from memory addresses
generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements
will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 124, defaulting to 0,
encoded in the "imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with first-faulting behavior of unsigned words to elements of a vector register from the memory
address generated by a 64-bit scalar base and scalar index which is multiplied by 4 and added to the base address.
After each element access the index value is incremented, but the index register is not updated. Inactive elements will
not not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_CNOTFIRST];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of unsigned words to active elements of a vector register from memory
addresses generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-
extended from 32 to 64 bits and then optionally multiplied by 4. Inactive elements will not cause a read from Device
memory or signal faults, and are set to zero in the destination vector.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 xs 1 Zm 0 1 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 1 Zm 0 1 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 0 Zm 0 1 1 Pg Rn Zt
msz<1>msz<0> U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 xs 0 Zm 0 1 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 1 Zm 1 1 1 Pg Rn Zt
U ff
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 0 Zm 1 1 1 Pg Rn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather load with first-faulting behavior of unsigned words to active elements of a vector register from memory
addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive
elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 124, defaulting to 0,
encoded in the "imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with non-faulting behavior of unsigned bytes to elements of a vector register from the memory
address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's
in-memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 4 classes: 8-bit element , 16-bit element , 32-bit element and 64-bit element
8-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with non-faulting behavior of doublewords to elements of a vector register from the memory address
generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-
memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with non-faulting behavior of unsigned halfwords to elements of a vector register from the memory
address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's
in-memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with non-faulting behavior of signed bytes to elements of a vector register from the memory address
generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-
memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element
16-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with non-faulting behavior of signed halfwords to elements of a vector register from the memory
address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's
in-memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with non-faulting behavior of signed words to elements of a vector register from the memory address
generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-
memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load with non-faulting behavior of unsigned words to elements of a vector register from the memory
address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's
in-memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load non-temporal of bytes to elements of a vector register from the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVESTREAM];
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load non-temporal of bytes to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is added to the base address. After each element access the index value is
incremented, but the index register is not updated. Inactive elements will not not cause a read from Device memory or
signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVESTREAM];
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load non-temporal of doublewords to elements of a vector register from the memory address generated by
a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements will not not cause a read from Device
memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVESTREAM];
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load non-temporal of doublewords to elements of a vector register from the memory address generated by
a 64-bit scalar base and scalar index which is multiplied by 8 and added to the base address. After each element
access the index value is incremented, but the index register is not updated. Inactive elements will not not cause a
read from Device memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVESTREAM];
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load non-temporal of halfwords to elements of a vector register from the memory address generated by a
64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements will not not cause a read from Device
memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVESTREAM];
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load non-temporal of halfwords to elements of a vector register from the memory address generated by a
64-bit scalar base and scalar index which is multiplied by 2 and added to the base address. After each element access
the index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVESTREAM];
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load non-temporal of words to elements of a vector register from the memory address generated by a
64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements will not not cause a read from Device
memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVESTREAM];
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous load non-temporal of words to elements of a vector register from the memory address generated by a
64-bit scalar base and scalar index which is multiplied by 4 and added to the base address. After each element access
the index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVESTREAM];
else
Elem[result, e, esize] = Zeros();
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load a predicate register from a memory address generated by a 64-bit scalar base, plus an immediate offset in the
range -256 to 255 which is multiplied by the current predicate register size in bytes. This instruction is unpredicated.
The load is performed as contiguous byte accesses, each containing 8 consecutive predicate bits in ascending element
order, with no endian conversion and no guarantee of single-copy atomicity larger than a byte. However, if alignment
is checked, then a general-purpose base register must be aligned to 2 bytes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 0 imm9h 0 0 0 imm9l Rn 0 Pt
Assembler Symbols
<Pt> Is the name of the destination scalable predicate register, encoded in the "Pt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -256 to 255, defaulting to 0, encoded in the
"imm9h:imm9l" fields.
Operation
CheckSVEEnabled();
integer elements = PL DIV 8;
bits(64) base;
integer offset = imm * elements;
bits(PL) result;
if n == 31 then
CheckSPAlignment();
if HaveMTEExt() then SetTagCheckedInstruction(FALSE);
base = SP[];
else
if HaveMTEExt() then SetTagCheckedInstruction(TRUE);
base = X[n];
P[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Load a vector register from a memory address generated by a 64-bit scalar base, plus an immediate offset in the range
-256 to 255 which is multiplied by the current vector register size in bytes. This instruction is unpredicated.
The load is performed as contiguous byte accesses, with no endian conversion and no guarantee of single-copy
atomicity larger than a byte. However, if alignment is checked, then the base register must be aligned to 16 bytes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 0 imm9h 0 1 0 imm9l Rn Zt
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -256 to 255, defaulting to 0, encoded in the
"imm9h:imm9l" fields.
Operation
CheckSVEEnabled();
integer elements = VL DIV 8;
bits(64) base;
integer offset = imm * elements;
bits(VL) result;
if n == 31 then
CheckSPAlignment();
if HaveMTEExt() then SetTagCheckedInstruction(FALSE);
base = SP[];
else
if HaveMTEExt() then SetTagCheckedInstruction(TRUE);
base = X[n];
Z[t] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift left by immediate each active element of the source vector, and destructively place the results in the
corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 0 to
number of bits per element minus 1. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 0 0 0 0 1 1 1 0 0 Pg tszl imm3 Zdn
L U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<const> Is the immediate shift amount, in the range 0 to number of bits per element minus 1, encoded in
"tsz:imm3".
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) mask = P[g];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = LSL(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift left by immediate each element of the source vector, and place the results in the corresponding elements of the
destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element
minus 1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 1 tszl imm3 1 0 0 1 1 1 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
<const> Is the immediate shift amount, in the range 0 to number of bits per element minus 1, encoded in
"tsz:imm3".
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
Elem[result, e, esize] = LSL(element1, shift);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LSL (immediate,
Page 2200
unpredicated)
LSL (vectors)
Shift left active elements of the first source vector by corresponding elements of the second source vector and
destructively place the results in the corresponding elements of the first source vector. The shift amount operand is a
vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive elements
in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 1 1 0 0 Pg Zm Zdn
R L U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = LSL(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift left active elements of the first source vector by corresponding overlapping 64-bit elements of the second source
vector and destructively place the results in the corresponding elements of the first source vector. The shift amount is
a vector of unsigned 64-bit doubleword elements in which all bits are significant, and not used modulo the destination
element size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 1 1 0 0 Pg Zm Zdn
R L U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 RESERVED
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(64) element2 = Elem[operand2, (e * esize) DIV 64, 64];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = LSL(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift left all elements of the first source vector by corresponding overlapping 64-bit elements of the second source
vector and place the first in the corresponding elements of the destination vector. The shift amount is a vector of
unsigned 64-bit doubleword elements in which all bits are significant, and not used modulo the destination element
size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 0 0 1 1 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 RESERVED
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(64) element2 = Elem[operand2, (e * esize) DIV 64, 64];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = LSL(element1, shift);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reversed shift left active elements of the second source vector by corresponding elements of the first source vector
and destructively place the results in the corresponding elements of the first source vector. The shift amount operand
is a vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 1 1 0 0 Pg Zm Zdn
R L U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
integer shift = Min(UInt(element1), esize);
Elem[result, e, esize] = LSL(element2, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift right by immediate, inserting zeroes, each active element of the source vector, and destructively place the results
in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 1 to
number of bits per element. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 0 0 0 0 0 1 1 0 0 Pg tszl imm3 Zdn
L U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<const> Is the immediate shift amount, in the range 1 to number of bits per element, encoded in "tsz:imm3".
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) mask = P[g];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = LSR(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift right by immediate, inserting zeroes, each element of the source vector, and place the results in the
corresponding elements of the destination vector. The immediate shift amount is an unsigned value in the range 1 to
number of bits per element. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 1 tszl imm3 1 0 0 1 0 1 Zn Zd
U
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
<const> Is the immediate shift amount, in the range 1 to number of bits per element, encoded in "tsz:imm3".
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
Elem[result, e, esize] = LSR(element1, shift);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
LSR (immediate,
Page 2210
unpredicated)
LSR (vectors)
Shift right, inserting zeroes, active elements of the first source vector by corresponding elements of the second source
vector and destructively place the results in the corresponding elements of the first source vector. The shift amount
operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element size.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 1 1 0 0 Pg Zm Zdn
R L U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = LSR(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift right, inserting zeroes, active elements of the first source vector by corresponding overlapping 64-bit elements of
the second source vector and destructively place the results in the corresponding elements of the first source vector.
The shift amount is a vector of unsigned 64-bit doubleword elements in which all bits are significant, and not used
modulo the destination element size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 1 1 0 0 Pg Zm Zdn
R L U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 RESERVED
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(64) element2 = Elem[operand2, (e * esize) DIV 64, 64];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = LSR(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Shift right, inserting zeroes, all elements of the first source vector by corresponding overlapping 64-bit elements of the
second source vector and place the first in the corresponding elements of the destination vector. The shift amount is a
vector of unsigned 64-bit doubleword elements in which all bits are significant, and not used modulo the destination
element size. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 0 0 0 1 Zn Zd
U
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 RESERVED
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(64) element2 = Elem[operand2, (e * esize) DIV 64, 64];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = LSR(element1, shift);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reversed shift right, inserting zeroes, active elements of the second source vector by corresponding elements of the
first source vector and destructively place the results in the corresponding elements of the first source vector. The
shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the
element size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 1 1 0 0 Pg Zm Zdn
R L U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
integer shift = Min(UInt(element1), esize);
Elem[result, e, esize] = LSR(element2, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply the corresponding active elements of the first and second source vectors and add to elements of the third
(addend) vector. Destructively place the results in the destination and first source (multiplicand) vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 1 1 0 Pg Za Zdn
op
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Za> Is the name of the third source scalable vector register, encoded in the "Za" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = if AnyActiveElement(mask, esize) then Z[a] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element1 = UInt(Elem[operand1, e, esize]);
integer element2 = UInt(Elem[operand2, e, esize]);
integer product = element1 * element2;
if sub_op then
Elem[result, e, esize] = Elem[operand3, e, esize] - product;
else
Elem[result, e, esize] = Elem[operand3, e, esize] + product;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply the corresponding active elements of the first and second source vectors and add to elements of the third
source (addend) vector. Destructively place the results in the destination and third source (addend) vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 0 1 0 Pg Zn Zda
op
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element1 = UInt(Elem[operand1, e, esize]);
integer element2 = UInt(Elem[operand2, e, esize]);
integer product = element1 * element2;
if sub_op then
Elem[result, e, esize] = Elem[operand3, e, esize] - product;
else
Elem[result, e, esize] = Elem[operand3, e, esize] + product;
else
Elem[result, e, esize] = Elem[operand3, e, esize];
Z[da] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply the corresponding active elements of the first and second source vectors and subtract from elements of the
third source (addend) vector. Destructively place the results in the destination and third source (addend) vector.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 0 1 1 Pg Zn Zda
op
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element1 = UInt(Elem[operand1, e, esize]);
integer element2 = UInt(Elem[operand2, e, esize]);
integer product = element1 * element2;
if sub_op then
Elem[result, e, esize] = Elem[operand3, e, esize] - product;
else
Elem[result, e, esize] = Elem[operand3, e, esize] + product;
else
Elem[result, e, esize] = Elem[operand3, e, esize];
Z[da] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unconditionally broadcast the logical bitmask immediate into each element of the destination vector. This instruction
is unpredicated. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16,
32 or 64 bits.
• The encodings in this description are named to match the encodings of DUPM.
• The description of DUPM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 1 0 0 0 0 imm13 Zd
is equivalent to
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.
Operation
The description of DUPM gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Read all elements from the source predicate and place in the destination predicate. This instruction is unpredicated.
Does not set the condition flags.
• The encodings in this description are named to match the encodings of ORR (predicates).
• The description of ORR (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
Operation
The description of ORR (predicates) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move a signed integer immediate to each active element in the destination vector. Inactive elements in the destination
vector register remain unmodified.
The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0).
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<simm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
• The encodings in this description are named to match the encodings of CPY (immediate, merging).
• The description of CPY (immediate, merging) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 1 sh imm8 Zd
M
is equivalent to
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<imm> Is a signed immediate in the range -128 to 127, encoded in the "imm8" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #8
Operation
The description of CPY (immediate, merging) gives the operational pseudocode for this instruction.
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move a signed integer immediate to each active element in the destination vector. Inactive elements in the destination
vector register are set to zero.
The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0).
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<simm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
• The encodings in this description are named to match the encodings of CPY (immediate, zeroing).
• The description of CPY (immediate, zeroing) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 0 sh imm8 Zd
M
is equivalent to
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<imm> Is a signed immediate in the range -128 to 127, encoded in the "imm8" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #8
Operation
The description of CPY (immediate, zeroing) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unconditionally broadcast the signed integer immediate into each element of the destination vector. This instruction is
unpredicated.
The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0).
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<simm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
• The encodings in this description are named to match the encodings of DUP (immediate).
• The description of DUP (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 0 1 1 sh imm8 Zd
is equivalent to
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is a signed immediate in the range -128 to 127, encoded in the "imm8" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #8
Operation
The description of DUP (immediate) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
MOV (immediate,
Page 2229
unpredicated)
MOV (predicate, predicated, merging)
Read active elements from the source predicate and place in the corresponding elements of the destination predicate.
Inactive elements in the destination predicate register remain unmodified. Does not set the condition flags.
• The encodings in this description are named to match the encodings of SEL (predicates).
• The description of SEL (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 1 Pn 1 Pd
S
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
Operation
The description of SEL (predicates) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Read active elements from the source predicate and place in the corresponding elements of the destination predicate.
Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.
• The encodings in this description are named to match the encodings of AND (predicates).
• The description of AND (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
Operation
The description of AND (predicates) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move the general-purpose scalar source register to each active element in the destination vector. Inactive elements in
the destination vector register remain unmodified.
• The encodings in this description are named to match the encodings of CPY (scalar).
• The description of CPY (scalar) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 0 1 0 1 Pg Rn Zd
is equivalent to
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
size <R>
01 W
x0 W
11 X
<n|SP> Is the number [0-30] of the general-purpose source register or the name SP (31), encoded in the "Rn"
field.
Operation
The description of CPY (scalar) gives the operational pseudocode for this instruction.
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unconditionally broadcast the general-purpose scalar source register into each element of the destination vector. This
instruction is unpredicated.
• The encodings in this description are named to match the encodings of DUP (scalar).
• The description of DUP (scalar) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Zd
is equivalent to
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
size <R>
01 W
x0 W
11 X
<n|SP> Is the number [0-30] of the general-purpose source register or the name SP (31), encoded in the "Rn"
field.
Operation
The description of DUP (scalar) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move the SIMD & floating-point scalar source register to each active element in the destination vector. Inactive
elements in the destination vector register remain unmodified.
• The encodings in this description are named to match the encodings of CPY (SIMD&FP scalar).
• The description of CPY (SIMD&FP scalar) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 1 0 0 Pg Vn Zd
is equivalent to
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
size <V>
00 B
01 H
10 S
11 D
<n> Is the number [0-31] of the source SIMD&FP register, encoded in the "Vn" field.
Operation
The description of CPY (SIMD&FP scalar) gives the operational pseudocode for this instruction.
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unconditionally broadcast the SIMD&FP scalar into each element of the destination vector. This instruction is
unpredicated.
• The encodings in this description are named to match the encodings of DUP (indexed).
• The description of DUP (indexed) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 imm2 1 tsz 0 0 1 0 0 0 Zn Zd
is equivalent to
is equivalent to
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
tsz <T>
00000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
10000 Q
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
<imm> Is the immediate index, in the range 0 to one less than the number of elements in 512 bits, encoded in
"imm2:tsz".
tsz <V>
00000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
10000 Q
<n> Is the number [0-31] of the source SIMD&FP register, encoded in the "Zn" field.
Operation
The description of DUP (indexed) gives the operational pseudocode for this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Move elements from the source vector to the corresponding elements of the destination vector. Inactive elements in
the destination vector register remain unmodified.
• The encodings in this description are named to match the encodings of SEL (vectors).
• The description of SEL (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 1 1 Pg Zn Zd
is equivalent to
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
Operation
The description of SEL (vectors) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
• The encodings in this description are named to match the encodings of ORR (vectors, unpredicated).
• The description of ORR (vectors, unpredicated) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 Zm 0 0 1 1 0 0 Zn Zd
is equivalent to
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
Operation
The description of ORR (vectors, unpredicated) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The predicated MOVPRFX instruction is a hint to hardware that the instruction may be combined with the destructive
instruction which follows it in program order to create a single constructive operation. Since it is a hint it is also
permitted to be implemented as a discrete vector copy, and the result of executing the pair of instructions with or
without combining is identical. The choice of combined versus discrete operation may vary dynamically.
Unless the combination of a constructive operation with merging predication is specifically required, it is strongly
recommended that for performance reasons software should prefer to use the zeroing form of predicated MOVPRFX or
the unpredicated MOVPRFX instruction.
Although the operation of the instruction is defined as a simple predicated vector copy, it is required that the prefixed
instruction at PC+4 must be an SVE destructive binary or ternary instruction encoding, or a unary operation with
merging predication, but excluding other MOVPRFX instructions. The prefixed instruction must specify the same
predicate register, and have the same maximum element size (ignoring a fixed 64-bit "wide vector" operand), and the
same destination vector as the MOVPRFX instruction. The prefixed instruction must not use the destination register in
any other operand position, even if they have different names but refer to the same architectural register state. Any
other use is UNPREDICTABLE.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 M 0 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
M <ZM>
0 Z
1 M
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) dest = Z[d];
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand1, e, esize];
Elem[result, e, esize] = element;
elsif merging then
Elem[result, e, esize] = Elem[dest, e, esize];
else
Elem[result, e, esize] = Zeros();
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The unpredicated MOVPRFX instruction is a hint to hardware that the instruction may be combined with the destructive
instruction which follows it in program order to create a single constructive operation. Since it is a hint it is also
permitted to be implemented as a discrete vector copy, and the result of executing the pair of instructions with or
without combining is identical. The choice of combined versus discrete operation may vary dynamically.
Although the operation of the instruction is defined as a simple unpredicated vector copy, it is required that the
prefixed instruction at PC+4 must be an SVE destructive binary or ternary instruction encoding, or a unary operation
with merging predication, but excluding other MOVPRFX instructions. The prefixed instruction must specify the same
destination vector as the MOVPRFX instruction. The prefixed instruction must not use the destination register in any
other operand position, even if they have different names but refer to the same architectural register state. Any other
use is UNPREDICTABLE.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 1 1 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
bits(VL) result = Z[n];
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Read active elements from the source predicate and place in the corresponding elements of the destination predicate.
Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition
flags based on the predicate result, and the V flag to zero.
• The encodings in this description are named to match the encodings of ANDS.
• The description of ANDS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
Operation
The description of ANDS gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Read all elements from the source predicate and place in the destination predicate. This instruction is unpredicated.
Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
• The encodings in this description are named to match the encodings of ORRS.
• The description of ORRS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
Operation
The description of ORRS gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply the corresponding active elements of the first and second source vectors and subtract from elements of the
third (addend) vector. Destructively place the results in the destination and first source (multiplicand) vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 1 1 1 Pg Za Zdn
op
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Za> Is the name of the third source scalable vector register, encoded in the "Za" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = if AnyActiveElement(mask, esize) then Z[a] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element1 = UInt(Elem[operand1, e, esize]);
integer element2 = UInt(Elem[operand2, e, esize]);
integer product = element1 * element2;
if sub_op then
Elem[result, e, esize] = Elem[operand3, e, esize] - product;
else
Elem[result, e, esize] = Elem[operand3, e, esize] + product;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply by an immediate each element of the source vector, and destructively place the results in the corresponding
elements of the source vector. The immediate is a signed 8-bit value in the range -128 to +127, inclusive. This
instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 0 0 0 0 1 1 0 imm8 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is the signed immediate operand, in the range -128 to 127, encoded in the "imm8" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = SInt(Elem[operand1, e, esize]);
Elem[result, e, esize] = (element1 * imm)<esize-1:0>;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply active elements of the first source vector by corresponding elements of the second source vector and
destructively place the results in the corresponding elements of the first source vector. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 0 0 0 0 Pg Zm Zdn
H U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
integer element1 = UInt(Elem[operand1, e, esize]);
integer element2 = UInt(Elem[operand2, e, esize]);
if ElemP[mask, e, esize] == '1' then
integer product = element1 * element2;
Elem[result, e, esize] = product<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise NAND active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 1 Pn 1 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = NOT(element1 AND element2);
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise NAND active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the
predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 1 Pn 1 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = NOT(element1 AND element2);
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Negate (predicated)
Negate the signed integer value in each active element of the source vector, and place the results in the corresponding
elements of the destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 1 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = SInt(Elem[operand, e, esize]);
element = -element;
Elem[result, e, esize] = element<esize-1:0>;
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise NOR active elements of the second source predicate with corresponding elements of the first source predicate
and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination
predicate register are set to zero. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = NOT(element1 OR element2);
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise NOR active elements of the second source predicate with corresponding elements of the first source predicate
and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination
predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result,
and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = NOT(element1 OR element2);
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise invert each active element of the source predicate, and place the results in the corresponding elements of the
destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the
condition flags.
• The encodings in this description are named to match the encodings of EOR (predicates).
• The description of EOR (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
Operation
The description of EOR (predicates) gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise invert each active element of the source vector, and place the results in the corresponding elements of the
destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 1 1 0 1 0 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = NOT element;
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise invert each active element of the source predicate, and place the results in the corresponding elements of the
destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N), NONE
(Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
• The encodings in this description are named to match the encodings of EORS.
• The description of EORS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S
is equivalent to
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
Operation
The description of EORS gives the operational pseudocode for this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise inclusive OR an inverted immediate with each 64-bit element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of
ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.
• The encodings in this description are named to match the encodings of ORR (immediate).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of ORR (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 0 0 0 0 imm13 Zdn
is equivalent to
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.
Operation
The description of ORR (immediate) gives the operational pseudocode for this instruction.
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise inclusive OR inverted active elements of the second source predicate with corresponding elements of the first
source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in
the destination predicate register are set to zero. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 0 Pn 1 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = element1 OR (NOT element2);
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise inclusive OR inverted active elements of the second source predicate with corresponding elements of the first
source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in
the destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the
predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 0 Pn 1 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = element1 OR (NOT element2);
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise inclusive OR an immediate with each 64-bit element of the source vector, and destructively place the results in
the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or
zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.
This instruction is used by the pseudo-instruction ORN (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 0 0 0 0 imm13 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV 64;
bits(VL) operand = Z[dn];
bits(VL) result;
for e = 0 to elements-1
bits(64) element1 = Elem[operand, e, 64];
Elem[result, e, 64] = element1 OR imm;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise inclusive OR active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Does not set the condition flags.
This instruction is used by the alias MOV.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Alias Conditions
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = element1 OR element2;
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise inclusive OR active elements of the second source vector with corresponding elements of the first source
vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in
the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 0 0 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element1 OR element2;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise inclusive OR all elements of the second source vector with corresponding elements of the first source vector
and place the first in the corresponding elements of the destination vector. This instruction is unpredicated.
This instruction is used by the alias MOV (vector, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 Zm 0 0 1 1 0 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Alias Conditions
Operation
CheckSVEEnabled();
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise inclusive OR active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the
predicate result, and the V flag to zero.
This instruction is used by the alias MOVS (unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Alias Conditions
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = element1 OR element2;
else
ElemP[result, e, esize] = '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Bitwise inclusive OR horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 0 0 0 1 Pg Zn Vd
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) result = Zeros(esize);
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
result = result OR Elem[operand, e, esize];
V[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
PFALSE <Pd>.B
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
Operation
CheckSVEEnabled();
P[d] = Zeros(PL);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Sets the first active element in the destination predicate to true, otherwise elements from the source predicate are
passed through unchanged. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result, and
the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 1 1 0 0 0 0 0 Pg 0 Pdn
S
Assembler Symbols
<Pdn> Is the name of the source and destination scalable predicate register, encoded in the "Pdn" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) result = P[dn];
integer first = -1;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' && first == -1 then
first = e;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
An instruction used to construct a loop which iterates over all active elements in a predicate. If all source predicate
elements are false it sets the first active predicate element in the destination predicate to true. Otherwise it
determines the next active predicate element following the last true source predicate element, and if one is found sets
the corresponding destination predicate element to true. All other destination predicate elements are set to false. Sets
the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 1 1 0 0 1 1 1 0 0 0 1 0 Pg 0 Pdn
Assembler Symbols
<Pdn> Is the name of the source and destination scalable predicate register, encoded in the "Pdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand = P[dn];
bits(PL) result;
result = Zeros();
if next < elements then
ElemP[result, next, esize] = '1';
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous prefetch of byte elements from the memory address generated by a 64-bit scalar base and immediate
index in the range -32 to 31 which is multiplied by the vector's in-memory size, irrespective of predication, and added
to the base address.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 0 0 0 Pg Rn 0 prfop
msz<1>msz<0>
Assembler Symbols
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -32 to 31, defaulting to 0, encoded in the
"imm6" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous prefetch of byte elements from the memory address generated by a 64-bit scalar base and scalar index
which is added to the base address. After each element prefetch the index value is incremented, but the index register
is not updated.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 0 Rm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>
Assembler Symbols
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(64) offset;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather prefetch of bytes from the active memory addresses generated by a 64-bit scalar base plus vector index. The
index values are optionally sign or zero-extended from 32 to 64 bits. Inactive addresses are not prefetched from
memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 3 classes: 32-bit scaled offset , 32-bit unpacked scaled offset and 64-bit scaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 1 Zm 0 0 0 Pg Rn 0 prfop
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 1 Zm 0 0 0 Pg Rn 0 prfop
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 1 Zm 1 0 0 Pg Rn 0 prfop
msz<1>msz<0>
Assembler Symbols
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather prefetch of bytes from the active memory addresses generated by a vector base plus immediate index. The
index is in the range 0 to 31. Inactive addresses are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
Assembler Symbols
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 31, defaulting to 0, encoded in the
"imm5" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + (offset << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous prefetch of doubleword elements from the memory address generated by a 64-bit scalar base and
immediate index in the range -32 to 31 which is multiplied by the vector's in-memory size, irrespective of predication,
and added to the base address.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 0 1 1 Pg Rn 0 prfop
msz<1>msz<0>
Assembler Symbols
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -32 to 31, defaulting to 0, encoded in the
"imm6" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous prefetch of doubleword elements from the memory address generated by a 64-bit scalar base and scalar
index which is multiplied by 8 and added to the base address. After each element prefetch the index value is
incremented, but the index register is not updated.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 0 0 Rm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>
Assembler Symbols
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(64) offset;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather prefetch of doublewords from the active memory addresses generated by a 64-bit scalar base plus vector
index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then multiplied by 8. Inactive
addresses are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 3 classes: 32-bit scaled offset , 32-bit unpacked scaled offset and 64-bit scaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 1 Zm 0 1 1 Pg Rn 0 prfop
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 1 Zm 0 1 1 Pg Rn 0 prfop
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 1 Zm 1 1 1 Pg Rn 0 prfop
msz<1>msz<0>
Assembler Symbols
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather prefetch of doublewords from the active memory addresses generated by a vector base plus immediate index.
The index is a multiple of 8 in the range 0 to 248. Inactive addresses are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
Assembler Symbols
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 8 in the range 0 to 248, defaulting to 0,
encoded in the "imm5" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + (offset << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous prefetch of halfword elements from the memory address generated by a 64-bit scalar base and immediate
index in the range -32 to 31 which is multiplied by the vector's in-memory size, irrespective of predication, and added
to the base address.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 0 0 1 Pg Rn 0 prfop
msz<1>msz<0>
Assembler Symbols
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -32 to 31, defaulting to 0, encoded in the
"imm6" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous prefetch of halfword elements from the memory address generated by a 64-bit scalar base and scalar
index which is multiplied by 2 and added to the base address. After each element prefetch the index value is
incremented, but the index register is not updated.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 0 Rm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>
Assembler Symbols
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(64) offset;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather prefetch of halfwords from the active memory addresses generated by a 64-bit scalar base plus vector index.
The index values are optionally first sign or zero-extended from 32 to 64 bits and then multiplied by 2. Inactive
addresses are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 3 classes: 32-bit scaled offset , 32-bit unpacked scaled offset and 64-bit scaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 1 Zm 0 0 1 Pg Rn 0 prfop
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 1 Zm 0 0 1 Pg Rn 0 prfop
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 1 Zm 1 0 1 Pg Rn 0 prfop
msz<1>msz<0>
Assembler Symbols
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather prefetch of halfwords from the active memory addresses generated by a vector base plus immediate index. The
index is a multiple of 2 in the range 0 to 62. Inactive addresses are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
Assembler Symbols
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 62, defaulting to 0,
encoded in the "imm5" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + (offset << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous prefetch of word elements from the memory address generated by a 64-bit scalar base and immediate
index in the range -32 to 31 which is multiplied by the vector's in-memory size, irrespective of predication, and added
to the base address.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 0 1 0 Pg Rn 0 prfop
msz<1>msz<0>
Assembler Symbols
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -32 to 31, defaulting to 0, encoded in the
"imm6" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous prefetch of word elements from the memory address generated by a 64-bit scalar base and scalar index
which is multiplied by 4 and added to the base address. After each element prefetch the index value is incremented,
but the index register is not updated.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 0 0 Rm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>
Assembler Symbols
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(64) offset;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather prefetch of words from the active memory addresses generated by a 64-bit scalar base plus vector index. The
index values are optionally first sign or zero-extended from 32 to 64 bits and then multiplied by 4. Inactive addresses
are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 3 classes: 32-bit scaled offset , 32-bit unpacked scaled offset and 64-bit scaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 1 Zm 0 1 0 Pg Rn 0 prfop
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 1 Zm 0 1 0 Pg Rn 0 prfop
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 1 Zm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>
Assembler Symbols
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Gather prefetch of words from the active memory addresses generated by a vector base plus immediate index. The
index is a multiple of 4 in the range 0 to 124. Inactive addresses are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>
Assembler Symbols
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 124, defaulting to 0,
encoded in the "imm5" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + (offset << scale);
Hint_Prefetch(addr, pref_hint, level, stream);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate source register, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 1 1 Pg 0 Pn 0 0 0 0 0
S
Assembler Symbols
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.
Operation
CheckSVEEnabled();
bits(PL) mask = P[g];
bits(PL) result = P[n];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Set elements of the destination predicate to true if the element number satisfies the named predicate constraint, or to
false otherwise. If the constraint specifies more elements than are available at the current vector length then all
elements of the destination predicate are set to false.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 1 1 0 0 0 1 1 1 0 0 0 pattern 0 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(PL) result;
for e = 0 to elements-1
ElemP[result, e, esize] = if e < count then '1' else '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(result, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Initialise predicate from named constraint and set the condition flags
Set elements of the destination predicate to true if the element number satisfies the named predicate constraint, or to
false otherwise. If the constraint specifies more elements than are available at the current vector length then all
elements of the destination predicate are set to false.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result,
and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 1 1 0 0 1 1 1 1 0 0 0 pattern 0 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(PL) result;
for e = 0 to elements-1
ElemP[result, e, esize] = if e < count then '1' else '0';
if setflags then
PSTATE.<N,Z,C,V> = PredTest(result, result, esize);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unpack elements from the lowest or highest half of the source predicate and place in elements of twice their size
within the destination predicate. This instruction is unpredicated.
It has encodings from 2 classes: High half and Low half
High half
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 Pn 0 Pd
H
Low half
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 Pn 0 Pd
H
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) operand = P[n];
bits(PL) result;
for e = 0 to elements-1
ElemP[result, e, esize] = ElemP[operand, if hi then e + elements else e, esize DIV 2];
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reverse bits in each active element of the source vector, and place the results in the corresponding elements of the
destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 1 1 1 0 0 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = BitReverse(element);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Read the first-fault register (FFR) and place active elements in the corresponding elements of the destination
predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 1 1 1 0 0 0 Pg 0 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
Operation
CheckSVEEnabled();
bits(PL) mask = P[g];
bits(PL) ffr = FFR[];
bits(PL) result = ffr AND mask;
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, 8);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Read the first-fault register (FFR) and place in the destination predicate without predication.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 Pd
S
RDFFR <Pd>.B
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
Operation
CheckSVEEnabled();
bits(PL) ffr = FFR[];
P[d] = ffr;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Read the first-fault register (FFR) and place active elements in the corresponding elements of the destination
predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C)
condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0 Pg 0 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
Operation
CheckSVEEnabled();
bits(PL) mask = P[g];
bits(PL) ffr = FFR[];
bits(PL) result = ffr AND mask;
if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, 8);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Multiply the current vector register size in bytes by an immediate in the range -32 to 31 and place the result in the
64-bit destination general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 1 1 1 1 0 1 0 1 0 imm6 Rd
Assembler Symbols
<Xd> Is the 64-bit name of the destination general-purpose register, encoded in the "Rd" field.
<imm> Is the signed immediate operand, in the range -32 to 31, encoded in the "imm6" field.
Operation
CheckSVEEnabled();
integer len = imm * (VL DIV 8);
X[d] = len<63:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reverse the order of all elements in the source predicate and place in the destination predicate. This instruction is
unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 1 0 0 0 1 0 0 0 0 0 Pn 0 Pd
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 B
01 H
10 S
11 D
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.
Operation
CheckSVEEnabled();
bits(PL) operand = P[n];
bits(PL) result = Reverse(operand, esize DIV 8);
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reverse the order of all elements in the source vector and place in the destination vector. This instruction is
unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 1 0 0 0 0 0 1 1 1 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
bits(VL) operand = Z[n];
bits(VL) result = Reverse(operand, esize);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reverse the order of 8-bit bytes, 16-bit halfwords or 32-bit words within each active element of the source vector, and
place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector
register remain unmodified.
It has encodings from 3 classes: Byte , Halfword and Word
Byte
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 0 0 1 0 0 Pg Zn Zd
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 0 1 1 0 0 Pg Zn Zd
Word
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 1 0 1 0 0 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<T> For the byte variant: is the size specifier, encoded in “size”:
size<0> <T>
0 S
1 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = Reverse(element, swsize);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compute the absolute difference between signed integer values in active elements of the second source vector and
corresponding elements of the first source vector and destructively place the difference in the corresponding elements
of the first source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 1 0 0 0 0 0 Pg Zm Zdn
U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer absdiff = Abs(element1 - element2);
Elem[result, e, esize] = absdiff<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed add horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register.
Narrow elements are first sign-extended to 64 bits. Inactive elements in the source vector are treated as zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 0 0 0 0 1 Pg Zn Vd
U
Assembler Symbols
<Dd> Is the 64-bit name of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 RESERVED
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = Z[n];
integer sum = 0;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = SInt(Elem[operand, e, esize]);
sum = sum + element;
V[d] = sum<63:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Convert to floating-point from the signed integer in each active element of the source vector, and place the results in
the corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
If the input and result types have a different size the smaller type is held unpacked in the least significant bits of
elements of the larger size. When the input is the smaller type the upper bits of each source element are ignored.
When the result is the smaller type the results are zero-extended to fill each destination element.
It has encodings from 7 classes: 16-bit to half-precision , 32-bit to half-precision , 32-bit to single-precision , 32-bit to
double-precision , 64-bit to half-precision , 64-bit to single-precision and 64-bit to double-precision
16-bit to half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 Pg Zn Zd
int_U
32-bit to half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 Pg Zn Zd
int_U
32-bit to single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 1 0 1 0 0 1 0 1 Pg Zn Zd
int_U
32-bit to double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 0 0 0 1 0 1 Pg Zn Zd
int_U
64-bit to half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 Pg Zn Zd
int_U
64-bit to single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 1 0 0 1 0 1 Pg Zn Zd
int_U
64-bit to double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 1 1 0 1 0 1 Pg Zn Zd
int_U
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
bits(d_esize) fpval = FixedToFP(element<s_esize-1:0>, 0, unsigned, FPCR[], rounding);
Elem[result, e, esize] = ZeroExtend(fpval);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Signed divide active elements of the first source vector by corresponding elements of the second source vector and
destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 0 0 0 0 Pg Zm Zdn
R U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size<0> <T>
0 S
1 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer element2 = Int(Elem[operand2, e, esize], unsigned);
integer quotient;
if element2 == 0 then
quotient = 0;
else
quotient = RoundTowardsZero(Real(element1) / Real(element2));
Elem[result, e, esize] = quotient<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed reversed divide active elements of the second source vector by corresponding elements of the first source
vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements
in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 0 0 0 0 Pg Zm Zdn
R U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size<0> <T>
0 S
1 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer element2 = Int(Elem[operand2, e, esize], unsigned);
integer quotient;
if element1 == 0 then
quotient = 0;
else
quotient = RoundTowardsZero(Real(element2) / Real(element1));
Elem[result, e, esize] = quotient<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The signed integer indexed dot product instruction computes the dot product of a group of four signed 8-bit or 16-bit
integer values held in each 32-bit or 64-bit element of the first source vector multiplied by a group of four signed 8-bit
or 16-bit integer values in an indexed 32-bit or 64-bit element of the second source vector, and then destructively adds
the widened dot product to the corresponding 32-bit or 64-bit element of the destination vector.
The groups within the second source vector are specified using an immediate index which selects the same group
position within each 128-bit vector segment. The index range is from 0 to one less than the number of groups per
128-bit segment, encoded in 1 to 2 bits depending on the size of the group. This instruction is unpredicated.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 0 0 0 Zn Zda
size<1>size<0> U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 0 0 0 Zn Zda
size<1>size<0> U
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> For the 32-bit variant: is the name of the second source scalable vector register Z0-Z7, encoded in the
"Zm" field.
For the 64-bit variant: is the name of the second source scalable vector register Z0-Z15, encoded in the
"Zm" field.
<imm> For the 32-bit variant: is the immediate index of a quadtuplet of four 8-bit elements within each 128-bit
vector segment, in the range 0 to 3, encoded in the "i2" field.
For the 64-bit variant: is the immediate index of a quadtuplet of four 16-bit elements within each 128-bit
vector segment, in the range 0 to 1, encoded in the "i1" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = SInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = SInt(Elem[operand2, 4 * s + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The signed integer dot product instruction computes the dot product of a group of four signed 8-bit or 16-bit integer
values held in each 32-bit or 64-bit element of the first source vector multiplied by a group of four signed 8-bit or
16-bit integer values in the corresponding 32-bit or 64-bit element of the second source vector, and then destructively
adds the widened dot product to the corresponding 32-bit or 64-bit element of the destination vector.
This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 0 Zm 0 0 0 0 0 0 Zn Zda
U
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
size<0> <T>
0 S
1 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
size<0> <Tb>
0 B
1 H
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = SInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = SInt(Elem[operand2, 4 * e + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;
Z[da] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Read active elements from the first source predicate and inactive elements from the second source predicate and
place in the corresponding elements of the destination predicate. Does not set the condition flags.
This instruction is used by the alias MOV (predicate, predicated, merging).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 1 Pn 1 Pd
S
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
Alias Conditions
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = element1;
else
ElemP[result, e, esize] = element2;
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Read active elements from the first source vector and inactive elements from the second source vector and place in
the corresponding elements of the destination vector.
This instruction is used by the alias MOV (vector, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 1 1 Pg Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Alias Conditions
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(NOT(mask), esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Elem[operand1, e, esize];
else
Elem[result, e, esize] = Elem[operand2, e, esize];
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Initialise the first-fault register (FFR) to all true prior to a sequence of first-fault or non-fault loads. This instruction is
unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
SETFFR
Operation
CheckSVEEnabled();
FFR[] = Ones(PL);
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the signed maximum of an immediate and each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is a signed 8-bit value in the range -128 to
+127, inclusive. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 0 1 1 0 imm8 Zdn
U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is the signed immediate operand, in the range -128 to 127, encoded in the "imm8" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
Elem[result, e, esize] = Max(element1, imm)<esize-1:0>;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the signed maximum of active elements of the second source vector and corresponding elements of the first
source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 0 0 0 0 0 Pg Zm Zdn
U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer maximum = Max(element1, element2);
Elem[result, e, esize] = maximum<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed maximum horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as the minimum signed integer for the element size.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 0 0 0 0 1 Pg Zn Vd
U
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
integer maximum = if unsigned then 0 else -(2^(esize-1));
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = Int(Elem[operand, e, esize], unsigned);
maximum = Max(maximum, element);
V[d] = maximum<esize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the signed minimum of an immediate and each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is a signed 8-bit value in the range -128 to
+127, inclusive. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 0 1 1 0 imm8 Zdn
U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is the signed immediate operand, in the range -128 to 127, encoded in the "imm8" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
Elem[result, e, esize] = Min(element1, imm)<esize-1:0>;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the signed minimum of active elements of the second source vector and corresponding elements of the first
source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 1 0 0 0 0 Pg Zm Zdn
U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer minimum = Min(element1, element2);
Elem[result, e, esize] = minimum<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed minimum horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as the maximum signed integer for the element size.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 1 0 0 0 1 Pg Zn Vd
U
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
integer minimum = if unsigned then (2^esize - 1) else (2^(esize-1) - 1);
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = Int(Elem[operand, e, esize], unsigned);
minimum = Min(minimum, element);
V[d] = minimum<esize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The signed integer matrix multiply-accumulate instruction multiplies the 2×8 matrix of signed 8-bit integer values
held in each 128-bit segment of the first source vector by the 8×2 matrix of signed 8-bit integer values in the
corresponding segment of the second source vector. The resulting 2×2 widened 32-bit integer matrix product is then
destructively added to the 32-bit integer matrix accumulator held in the corresponding segment of the addend and
destination vector. This is equivalent to performing an 8-way dot product per destination element.
This instruction is unpredicated.
ID_AA64ZFR0_EL1.I8MM indicates whether this instruction is implemented.
SVE
(FEAT_I8MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 1 0 0 0 Zm 1 0 0 1 1 0 Zn Zda
uns<1>uns<0>
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer segments = VL DIV 128;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result = Zeros();
bits(128) op1, op2;
bits(128) res, addend;
for s = 0 to segments-1
op1 = Elem[operand1, s, 128];
op2 = Elem[operand2, s, 128];
addend = Elem[operand3, s, 128];
res = MatMulAdd(addend, op1, op2, op1_unsigned, op2_unsigned);
Elem[result, s, 128] = res;
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Widening multiply signed integer values in active elements of the first source vector by corresponding elements of the
second source vector and destructively place the high half of the result in the corresponding elements of the first
source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 0 0 0 0 Pg Zm Zdn
H U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer product = (element1 * element2) >> esize;
Elem[result, e, esize] = product<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Copy the first active to last active elements (inclusive) from the first source vector to the lowest-numbered elements of
the result. Then set any remaining elements of the result to a copy of the lowest-numbered elements from the second
source vector. The result is placed destructively in the first source vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 1 0 0 1 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = Z[m];
bits(VL) result;
integer x = 0;
boolean active = FALSE;
integer lastnum = LastActiveElement(mask, esize);
elements = (elements - x) - 1;
for e = 0 to elements
Elem[result, x, esize] = Elem[operand2, e, esize];
x = x + 1;
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating add of an unsigned immediate to each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. Each result element is saturated to the N-bit element's
signed integer range -2(N-1) to (2(N-1) )-1. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 1 0 0 1 1 sh imm8 Zdn
U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + imm, esize, unsigned);
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating add all elements of the second source vector to corresponding elements of the first source vector
and place the results in the corresponding elements of the destination vector. Each result element is saturated to the
N-bit element's signed integer range -2(N-1) to (2(N-1) )-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 1 0 0 Zn Zd
U
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + element2, esize, unsigned);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating decrement scalar by multiple of 8-bit predicate constraint element count
Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating decrement scalar by multiple of 64-bit predicate constraint element count
Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating decrement vector by multiple of 64-bit predicate constraint element count
Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. The
results are saturated to the 64-bit signed integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 0 0 1 0 pattern Zdn
size<1>size<0> D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - (count * imm), esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating decrement scalar by multiple of 16-bit predicate constraint element count
Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating decrement vector by multiple of 16-bit predicate constraint element count
Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. The
results are saturated to the 16-bit signed integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 0 0 1 0 pattern Zdn
size<1>size<0> D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - (count * imm), esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Counts the number of true elements in the source predicate and then uses the result to decrement the scalar
destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated
result is then sign-extended to 64 bits.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 0 1 0 0 0 1 0 0 Pm Rdn
D U sf
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 0 1 0 0 0 1 1 0 Pm Rdn
D U sf
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.
size <T>
00 B
01 H
10 S
11 D
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(ssize) operand1 = X[dn];
bits(PL) operand2 = P[m];
bits(ssize) result;
integer count = 0;
for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Counts the number of true elements in the source predicate and then uses the result to decrement all destination
vector elements. The results are saturated to the element signed integer range.
The predicate size specifier may be omitted in assembler source code, but this is deprecated and will be prohibited in
a future release of the architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 0 1 0 0 0 0 0 0 Pm Zdn
D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) operand2 = P[m];
bits(VL) result;
integer count = 0;
for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;
for e = 0 to elements-1
integer element = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element - count, esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating decrement scalar by multiple of 32-bit predicate constraint element count
Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating decrement vector by multiple of 32-bit predicate constraint element count
Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. The
results are saturated to the 32-bit signed integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 0 0 1 0 pattern Zdn
size<1>size<0> D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - (count * imm), esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating increment scalar by multiple of 8-bit predicate constraint element count
Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating increment scalar by multiple of 64-bit predicate constraint element count
Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating increment vector by multiple of 64-bit predicate constraint element count
Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. The
results are saturated to the 64-bit signed integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + (count * imm), esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating increment scalar by multiple of 16-bit predicate constraint element count
Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating increment vector by multiple of 16-bit predicate constraint element count
Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. The
results are saturated to the 16-bit signed integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + (count * imm), esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Counts the number of true elements in the source predicate and then uses the result to increment the scalar
destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated
result is then sign-extended to 64 bits.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 0 1 0 0 0 1 0 0 Pm Rdn
D U sf
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 0 1 0 0 0 1 1 0 Pm Rdn
D U sf
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.
size <T>
00 B
01 H
10 S
11 D
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(ssize) operand1 = X[dn];
bits(PL) operand2 = P[m];
bits(ssize) result;
integer count = 0;
for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Counts the number of true elements in the source predicate and then uses the result to increment all destination
vector elements. The results are saturated to the element signed integer range.
The predicate size specifier may be omitted in assembler source code, but this is deprecated and will be prohibited in
a future release of the architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 0 1 0 0 0 0 0 0 Pm Zdn
D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) operand2 = P[m];
bits(VL) result;
integer count = 0;
for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;
for e = 0 to elements-1
integer element = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element + count, esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating increment scalar by multiple of 32-bit predicate constraint element count
Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating increment vector by multiple of 32-bit predicate constraint element count
Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. The
results are saturated to the 32-bit signed integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + (count * imm), esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating subtract of an unsigned immediate from each element of the source vector, and destructively place
the results in the corresponding elements of the source vector. Each result element is saturated to the N-bit element's
signed integer range -2(N-1) to (2(N-1) )-1. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 1 1 0 1 1 sh imm8 Zdn
U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - imm, esize, unsigned);
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Signed saturating subtract all elements of the second source vector from corresponding elements of the first source
vector and place the results in the corresponding elements of the destination vector. Each result element is saturated
to the N-bit element's signed integer range -2(N-1) to (2(N-1) )-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 1 1 0 Zn Zd
U
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - element2, esize, unsigned);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store of bytes from elements of a vector register to the memory address generated by a 64-bit scalar base
and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 size 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store of bytes from elements of a vector register to the memory address generated by a 64-bit scalar base
and scalar index which is added to the base address. After each element access the index value is incremented, but the
index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 size Rm 0 1 0 Pg Rn Zt
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Scatter store of bytes from the active elements of a vector register to the memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally sign or zero-extended from 32 to 64 bits. Inactive
elements are not written to memory.
It has encodings from 3 classes: 32-bit unpacked unscaled offset , 32-bit unscaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Scatter store of bytes from the active elements of a vector register to the memory addresses generated by a vector
base plus immediate index. The index is in the range 0 to 31. Inactive elements are not written to memory.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 0 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 31, defaulting to 0, encoded in the
"imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store of doublewords from elements of a vector register to the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 size 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store of doublewords from elements of a vector register to the memory address generated by a 64-bit
scalar base and scalar index which is multiplied by 8 and added to the base address. After each element access the
index value is incremented, but the index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 1 Rm 0 1 0 Pg Rn Zt
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Scatter store of doublewords from the active elements of a vector register to the memory addresses generated by a
64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and
then optionally multiplied by 8. Inactive elements are not written to memory.
It has encodings from 4 classes: 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset , 64-bit scaled offset
and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 1 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Scatter store of doublewords from the active elements of a vector register to the memory addresses generated by a
vector base plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive elements are not written
to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 0 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 8 in the range 0 to 248, defaulting to 0,
encoded in the "imm5" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store of halfwords from elements of a vector register to the memory address generated by a 64-bit scalar
base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 size 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store of halfwords from elements of a vector register to the memory address generated by a 64-bit scalar
base and scalar index which is multiplied by 2 and added to the base address. After each element access the index
value is incremented, but the index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 size Rm 0 1 0 Pg Rn Zt
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Scatter store of halfwords from the active elements of a vector register to the memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then
optionally multiplied by 2. Inactive elements are not written to memory.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 1 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Scatter store of halfwords from the active elements of a vector register to the memory addresses generated by a
vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements are not written
to memory.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 0 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 62, defaulting to 0,
encoded in the "imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store of words from elements of a vector register to the memory address generated by a 64-bit scalar base
and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 size 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
size<0> <T>
0 S
1 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store of words from elements of a vector register to the memory address generated by a 64-bit scalar base
and scalar index which is multiplied by 4 and added to the base address. After each element access the index value is
incremented, but the index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 size Rm 0 1 0 Pg Rn Zt
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
size<0> <T>
0 S
1 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Scatter store of words from the active elements of a vector register to the memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then
optionally multiplied by 4. Inactive elements are not written to memory.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 1 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.
xs <mod>
0 UXTW
1 SXTW
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Scatter store of words from the active elements of a vector register to the memory addresses generated by a vector
base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements are not written to
memory.
It has encodings from 2 classes: 32-bit element and 64-bit element
32-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
64-bit element
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 0 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 124, defaulting to 0,
encoded in the "imm5" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) src;
constant integer mbytes = msize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store two-byte structures, each from the same element number in two vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 2 in the range -16 to 14, defaulting to 0,
encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Contiguous store two-byte structures, each from the same element number in two vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register and added to the base address. After each
structure access the index value is incremented by two. The index register is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store two-doubleword structures, each from the same element number in two vector registers to the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to
14 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive doublewords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 2 in the range -16 to 14, defaulting to 0,
encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Contiguous store two-doubleword structures, each from the same element number in two vector registers to the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by two. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive doublewords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store two-halfword structures, each from the same element number in two vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive halfwords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 2 in the range -16 to 14, defaulting to 0,
encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Contiguous store two-halfword structures, each from the same element number in two vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by two. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive halfwords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store two-word structures, each from the same element number in two vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 2 in the range -16 to 14, defaulting to 0,
encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Contiguous store two-word structures, each from the same element number in two vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by two. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store three-byte structures, each from the same element number in three vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store three-byte structures, each from the same element number in three vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register and added to the base address. After each
structure access the index value is incremented by three. The index register is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store three-doubleword structures, each from the same element number in three vector registers to the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to
21 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive doublewords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store three-doubleword structures, each from the same element number in three vector registers to the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by three. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive doublewords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store three-halfword structures, each from the same element number in three vector registers to the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to
21 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive halfwords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store three-halfword structures, each from the same element number in three vector registers to the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by three. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive halfwords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store three-word structures, each from the same element number in three vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store three-word structures, each from the same element number in three vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by three. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store four-byte structures, each from the same element number in four vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
ST4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store four-byte structures, each from the same element number in four vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register and added to the base address. After each
structure access the index value is incremented by four. The index register is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store four-doubleword structures, each from the same element number in four vector registers to the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to
28 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive doublewords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
ST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store four-doubleword structures, each from the same element number in four vector registers to the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by four. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive doublewords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
ST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store four-halfword structures, each from the same element number in four vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive halfwords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
ST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store four-halfword structures, each from the same element number in four vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by four. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive halfwords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
ST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store four-word structures, each from the same element number in four vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
ST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store four-word structures, each from the same element number in four vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by four. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
ST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]
Assembler Symbols
<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;
for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];
for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store non-temporal of bytes from elements of a vector register to the memory address generated by a
64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
constant integer mbytes = esize DIV 8;
bits(VL) src;
bits(PL) mask = P[g];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store non-temporal of bytes from elements of a vector register to the memory address generated by a
64-bit scalar base and scalar index which is added to the base address. After each element access the index value is
incremented, but the index register is not updated. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(VL) src;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store non-temporal of doublewords from elements of a vector register to the memory address generated by
a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
constant integer mbytes = esize DIV 8;
bits(VL) src;
bits(PL) mask = P[g];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store non-temporal of doublewords from elements of a vector register to the memory address generated by
a 64-bit scalar base and scalar index which is multiplied by 8 and added to the base address. After each element
access the index value is incremented, but the index register is not updated. Inactive elements are not written to
memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(VL) src;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store non-temporal of halfwords from elements of a vector register to the memory address generated by a
64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
constant integer mbytes = esize DIV 8;
bits(VL) src;
bits(PL) mask = P[g];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store non-temporal of halfwords from elements of a vector register to the memory address generated by a
64-bit scalar base and scalar index which is multiplied by 2 and added to the base address. After each element access
the index value is incremented, but the index register is not updated. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(VL) src;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store non-temporal of words from elements of a vector register to the memory address generated by a
64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
constant integer mbytes = esize DIV 8;
bits(VL) src;
bits(PL) mask = P[g];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Contiguous store non-temporal of words from elements of a vector register to the memory address generated by a
64-bit scalar base and scalar index which is multiplied by 4 and added to the base address. After each element access
the index value is incremented, but the index register is not updated. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(VL) src;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store a predicate register to a memory address generated by a 64-bit scalar base, plus an immediate offset in the
range -256 to 255 which is multiplied by the current predicate register size in bytes. This instruction is unpredicated.
The store is performed as contiguous byte accesses, each containing 8 consecutive predicate bits in ascending element
order, with no endian conversion and no guarantee of single-copy atomicity larger than a byte. However, if alignment
is checked, then a general-purpose base register must be aligned to 2 bytes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 imm9h 0 0 0 imm9l Rn 0 Pt
Assembler Symbols
<Pt> Is the name of the scalable predicate transfer register, encoded in the "Pt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -256 to 255, defaulting to 0, encoded in the
"imm9h:imm9l" fields.
Operation
CheckSVEEnabled();
integer elements = PL DIV 8;
bits(PL) src;
bits(64) base;
integer offset = imm * elements;
if n == 31 then
CheckSPAlignment();
if HaveMTEExt() then SetTagCheckedInstruction(FALSE);
base = SP[];
else
if HaveMTEExt() then SetTagCheckedInstruction(TRUE);
base = X[n];
src = P[t];
boolean aligned = AArch64.CheckAlignment(base + offset, 2, AccType_SVE, TRUE);
for e = 0 to elements-1
AArch64.MemSingle[base + offset, 1, AccType_SVE, aligned] = Elem[src, e, 8];
offset = offset + 1;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Store a vector register to a memory address generated by a 64-bit scalar base, plus an immediate offset in the range
-256 to 255 which is multiplied by the current vector register size in bytes. This instruction is unpredicated.
The store is performed as contiguous byte accesses, with no endian conversion and no guarantee of single-copy
atomicity larger than a byte. However, if alignment is checked, then the base register must be aligned to 16 bytes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 imm9h 0 1 0 imm9l Rn Zt
Assembler Symbols
<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -256 to 255, defaulting to 0, encoded in the
"imm9h:imm9l" fields.
Operation
CheckSVEEnabled();
integer elements = VL DIV 8;
bits(VL) src;
bits(64) base;
integer offset = imm * elements;
if n == 31 then
CheckSPAlignment();
if HaveMTEExt() then SetTagCheckedInstruction(FALSE);
base = SP[];
else
if HaveMTEExt() then SetTagCheckedInstruction(TRUE);
base = X[n];
src = Z[t];
boolean aligned = AArch64.CheckAlignment(base + offset, 16, AccType_SVE, TRUE);
for e = 0 to elements-1
AArch64.MemSingle[base + offset, 1, AccType_SVE, aligned] = Elem[src, e, 8];
offset = offset + 1;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract an unsigned immediate from each element of the source vector, and destructively place the results in the
corresponding elements of the source vector. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 0 0 1 1 1 sh imm8 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
Elem[result, e, esize] = element1 - imm;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract active elements of the second source vector from corresponding elements of the first source vector and
destructively place the results in the corresponding elements of the first source vector. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 0 1 0 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element1 - element2;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Subtract all elements of the second source vector from corresponding elements of the first source vector and place the
results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 0 0 1 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = element1 - element2;
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reversed subtract from an unsigned immediate each element of the source vector, and destructively place the results
in the corresponding elements of the source vector. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 0 1 1 1 1 sh imm8 Zdn
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = UInt(Elem[operand1, e, esize]);
Elem[result, e, esize] = (imm - element1)<esize-1:0>;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reversed subtract active elements of the first source vector from corresponding elements of the second source vector
and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 1 1 0 0 0 Pg Zm Zdn
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element2 - element1;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The signed by unsigned integer indexed dot product instruction computes the dot product of a group of four signed
8-bit integer values held in each 32-bit element of the first source vector multiplied by a group of four unsigned 8-bit
integer values in an indexed 32-bit element of the second source vector, and then destructively adds the widened dot
product to the corresponding 32-bit element of the destination vector.
The groups within the second source vector are specified using an immediate index which selects the same group
position within each 128-bit vector segment. The index range is from 0 to 3. This instruction is unpredicated.
ID_AA64ZFR0_EL1.I8MM indicates whether this instruction is implemented.
SVE
(FEAT_I8MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 1 1 1 Zn Zda
size<1>size<0> U
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register Z0-Z7, encoded in the "Zm" field.
<imm> Is the immediate index of a quadtuplet of four 8-bit elements within each 128-bit vector segment, in the
range 0 to 3, encoded in the "i2" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = SInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = UInt(Elem[operand2, 4 * s + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;
Z[da] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unpack elements from the lowest or highest half of the source vector and then sign-extend them to place in elements
of twice their size within the destination vector. This instruction is unpredicated.
It has encodings from 2 classes: High half and Low half
High half
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 1 0 0 1 1 1 0 Zn Zd
U H
Low half
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 0 0 0 1 1 1 0 Zn Zd
U H
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <Tb>
00 RESERVED
01 B
10 H
11 S
CheckSVEEnabled();
integer elements = VL DIV esize;
integer hsize = esize DIV 2;
bits(VL) operand = Z[n];
bits(VL) result;
for e = 0 to elements-1
bits(hsize) element = if hi then Elem[operand, e + elements, hsize] else Elem[operand, e, hsize];
Elem[result, e, esize] = Extend(element, esize, unsigned);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Sign-extend the least-significant sub-element of each active element of the source vector, and place the results in the
corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
It has encodings from 3 classes: Byte , Halfword and Word
Byte
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 0 1 0 1 Pg Zn Zd
U
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 0 1 0 1 Pg Zn Zd
U
Word
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 0 1 0 1 Pg Zn Zd
U
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<T> For the byte variant: is the size specifier, encoded in “size”:
size <T>
00 RESERVED
01 H
10 S
11 D
size<0> <T>
0 S
1 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = Extend(element<s_esize-1:0>, esize, unsigned);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Reads each element of the second source (index) vector and uses its value to select an indexed element from the first
source (table) vector, and places the indexed table element in the destination vector element corresponding to the
index vector element. If an index value is greater than or equal to the number of vector elements then it places zero in
the corresponding destination vector element.
Since the index values can select any element in a vector this operation is not naturally vector length agnostic.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 0 1 1 0 0 Zn Zd
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
integer idx = UInt(Elem[operand2, e, esize]);
Elem[result, e, esize] = if idx < elements then Elem[operand1, idx, esize] else Zeros();
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Interleave alternating even or odd-numbered elements from the first and second source predicates and place in
elements of the destination predicate. This instruction is unpredicated.
It has encodings from 2 classes: Even and Odd
Even
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 1 0 0 0 Pn 0 Pd
H
Odd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 1 0 1 0 Pn 0 Pd
H
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 B
01 H
10 S
11 D
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
CheckSVEEnabled();
integer pairs = VL DIV (esize * 2);
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for p = 0 to pairs-1
Elem[result, 2*p+0, esize DIV 8] = Elem[operand1, 2*p+part, esize DIV 8];
Elem[result, 2*p+1, esize DIV 8] = Elem[operand2, 2*p+part, esize DIV 8];
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Interleave alternating even or odd-numbered elements from the first and second source vectors and place in elements
of the destination vector. This instruction is unpredicated. The 128-bit element variant of this instruction requires that
the current vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then
the trailing bits are set to zero.
ID_AA64ZFR0_EL1.F64MM indicates whether the 128-bit element variant of the instruction is implemented.
It has encodings from 4 classes: Even , Even (quadwords) , Odd and Odd (quadwords)
Even
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 1 0 0 Zn Zd
H
Even (quadwords)
(FEAT_F64MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 1 1 0 Zn Zd
H
Odd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 1 0 1 Zn Zd
H
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 1 1 1 Zn Zd
H
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
if VL < esize * 2 then UNDEFINED;
integer pairs = VL DIV (esize * 2);
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result = Zeros();
for p = 0 to pairs-1
Elem[result, 2*p+0, esize] = Elem[operand1, 2*p+part, esize];
Elem[result, 2*p+1, esize] = Elem[operand2, 2*p+part, esize];
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Compute the absolute difference between unsigned integer values in active elements of the second source vector and
corresponding elements of the first source vector and destructively place the difference in the corresponding elements
of the first source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 1 0 1 0 0 0 Pg Zm Zdn
U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer absdiff = Abs(element1 - element2);
Elem[result, e, esize] = absdiff<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned add horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register.
Narrow elements are first zero-extended to 64 bits. Inactive elements in the source vector are treated as zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 0 1 0 0 1 Pg Zn Vd
U
Assembler Symbols
<Dd> Is the 64-bit name of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
integer sum = 0;
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = UInt(Elem[operand, e, esize]);
sum = sum + element;
V[d] = sum<63:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Convert to floating-point from the unsigned integer in each active element of the source vector, and place the results
in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
If the input and result types have a different size the smaller type is held unpacked in the least significant bits of
elements of the larger size. When the input is the smaller type the upper bits of each source element are ignored.
When the result is the smaller type the results are zero-extended to fill each destination element.
It has encodings from 7 classes: 16-bit to half-precision , 32-bit to half-precision , 32-bit to single-precision , 32-bit to
double-precision , 64-bit to half-precision , 64-bit to single-precision and 64-bit to double-precision
16-bit to half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 0 1 1 1 0 1 Pg Zn Zd
int_U
32-bit to half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 Pg Zn Zd
int_U
32-bit to single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 Pg Zn Zd
int_U
32-bit to double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 Pg Zn Zd
int_U
64-bit to half-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 1 Pg Zn Zd
int_U
64-bit to single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 1 0 1 1 0 1 Pg Zn Zd
int_U
64-bit to double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 1 1 1 1 0 1 Pg Zn Zd
int_U
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
bits(d_esize) fpval = FixedToFP(element<s_esize-1:0>, 0, unsigned, FPCR[], rounding);
Elem[result, e, esize] = ZeroExtend(fpval);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Unsigned divide active elements of the first source vector by corresponding elements of the second source vector and
destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 1 0 0 0 Pg Zm Zdn
R U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size<0> <T>
0 S
1 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer element2 = Int(Elem[operand2, e, esize], unsigned);
integer quotient;
if element2 == 0 then
quotient = 0;
else
quotient = RoundTowardsZero(Real(element1) / Real(element2));
Elem[result, e, esize] = quotient<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned reversed divide active elements of the second source vector by corresponding elements of the first source
vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements
in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 1 0 0 0 Pg Zm Zdn
R U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size<0> <T>
0 S
1 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer element2 = Int(Elem[operand2, e, esize], unsigned);
integer quotient;
if element1 == 0 then
quotient = 0;
else
quotient = RoundTowardsZero(Real(element2) / Real(element1));
Elem[result, e, esize] = quotient<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The unsigned integer indexed dot product instruction computes the dot product of a group of four unsigned 8-bit or
16-bit integer values held in each 32-bit or 64-bit element of the first source vector multiplied by a group of four
unsigned 8-bit or 16-bit integer values in an indexed 32-bit or 64-bit element of the second source vector, and then
destructively adds the widened dot product to the corresponding 32-bit or 64-bit element of the destination vector.
The groups within the second source vector are specified using an immediate index which selects the same group
position within each 128-bit vector segment. The index range is from 0 to one less than the number of groups per
128-bit segment, encoded in 1 to 2 bits depending on the size of the group. This instruction is unpredicated.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 0 0 1 Zn Zda
size<1>size<0> U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 0 0 1 Zn Zda
size<1>size<0> U
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> For the 32-bit variant: is the name of the second source scalable vector register Z0-Z7, encoded in the
"Zm" field.
For the 64-bit variant: is the name of the second source scalable vector register Z0-Z15, encoded in the
"Zm" field.
<imm> For the 32-bit variant: is the immediate index of a quadtuplet of four 8-bit elements within each 128-bit
vector segment, in the range 0 to 3, encoded in the "i2" field.
For the 64-bit variant: is the immediate index of a quadtuplet of four 16-bit elements within each 128-bit
vector segment, in the range 0 to 1, encoded in the "i1" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = UInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = UInt(Elem[operand2, 4 * s + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The unsigned integer dot product instruction computes the dot product of a group of four unsigned 8-bit or 16-bit
integer values held in each 32-bit or 64-bit element of the first source vector multiplied by a group of four unsigned
8-bit or 16-bit integer values in the corresponding 32-bit or 64-bit element of the second source vector, and then
destructively adds the widened dot product to the corresponding 32-bit or 64-bit element of the destination vector.
This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 0 Zm 0 0 0 0 0 1 Zn Zda
U
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
size<0> <T>
0 S
1 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
size<0> <Tb>
0 B
1 H
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = UInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = UInt(Elem[operand2, 4 * e + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;
Z[da] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the unsigned maximum of an immediate and each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is an unsigned 8-bit value in the range 0 to
255, inclusive. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 1 1 1 0 imm8 Zdn
U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is the unsigned immediate operand, in the range 0 to 255, encoded in the "imm8" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
Elem[result, e, esize] = Max(element1, imm)<esize-1:0>;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the unsigned maximum of active elements of the second source vector and corresponding elements of the
first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 0 1 0 0 0 Pg Zm Zdn
U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer maximum = Max(element1, element2);
Elem[result, e, esize] = maximum<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned maximum horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 0 1 0 0 1 Pg Zn Vd
U
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
integer maximum = if unsigned then 0 else -(2^(esize-1));
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = Int(Elem[operand, e, esize], unsigned);
maximum = Max(maximum, element);
V[d] = maximum<esize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the unsigned minimum of an immediate and each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is an unsigned 8-bit value in the range 0 to
255, inclusive. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 1 1 1 0 imm8 Zdn
U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is the unsigned immediate operand, in the range 0 to 255, encoded in the "imm8" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
Elem[result, e, esize] = Min(element1, imm)<esize-1:0>;
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Determine the unsigned minimum of active elements of the second source vector and corresponding elements of the
first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 1 1 0 0 0 Pg Zm Zdn
U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer minimum = Min(element1, element2);
Elem[result, e, esize] = minimum<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned minimum horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as the maximum unsigned integer for the element size.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 1 1 0 0 1 Pg Zn Vd
U
Assembler Symbols
size <V>
00 B
01 H
10 S
11 D
<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <T>
00 B
01 H
10 S
11 D
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
integer minimum = if unsigned then (2^esize - 1) else (2^(esize-1) - 1);
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = Int(Elem[operand, e, esize], unsigned);
minimum = Min(minimum, element);
V[d] = minimum<esize-1:0>;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The unsigned integer matrix multiply-accumulate instruction multiplies the 2×8 matrix of unsigned 8-bit integer
values held in each 128-bit segment of the first source vector by the 8×2 matrix of unsigned 8-bit integer values in the
corresponding segment of the second source vector. The resulting 2×2 widened 32-bit integer matrix product is then
destructively added to the 32-bit integer matrix accumulator held in the corresponding segment of the addend and
destination vector. This is equivalent to performing an 8-way dot product per destination element.
This instruction is unpredicated.
ID_AA64ZFR0_EL1.I8MM indicates whether this instruction is implemented.
SVE
(FEAT_I8MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 1 1 1 0 Zm 1 0 0 1 1 0 Zn Zda
uns<1>uns<0>
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer segments = VL DIV 128;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result = Zeros();
bits(128) op1, op2;
bits(128) res, addend;
for s = 0 to segments-1
op1 = Elem[operand1, s, 128];
op2 = Elem[operand2, s, 128];
addend = Elem[operand3, s, 128];
res = MatMulAdd(addend, op1, op2, op1_unsigned, op2_unsigned);
Elem[result, s, 128] = res;
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Widening multiply unsigned integer values in active elements of the first source vector by corresponding elements of
the second source vector and destructively place the high half of the result in the corresponding elements of the first
source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 1 0 0 0 Pg Zm Zdn
H U
Assembler Symbols
<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer product = (element1 * element2) >> esize;
Elem[result, e, esize] = product<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating add of an unsigned immediate to each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. Each result element is saturated to the N-bit element's
unsigned integer range 0 to (2N)-1. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 1 0 1 1 1 sh imm8 Zdn
U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + imm, esize, unsigned);
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating add all elements of the second source vector to corresponding elements of the first source vector
and place the results in the corresponding elements of the destination vector. Each result element is saturated to the
N-bit element's unsigned integer range 0 to (2N)-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 1 0 1 Zn Zd
U
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + element2, esize, unsigned);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating decrement scalar by multiple of 8-bit predicate constraint element count
Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating decrement scalar by multiple of 64-bit predicate constraint element count
Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating decrement vector by multiple of 64-bit predicate constraint element count
Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. The
results are saturated to the 64-bit unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 0 0 1 1 pattern Zdn
size<1>size<0> D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - (count * imm), esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating decrement scalar by multiple of 16-bit predicate constraint element count
Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating decrement vector by multiple of 16-bit predicate constraint element count
Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. The
results are saturated to the 16-bit unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 0 0 1 1 pattern Zdn
size<1>size<0> D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - (count * imm), esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Counts the number of true elements in the source predicate and then uses the result to decrement the scalar
destination. The result is saturated to the general-purpose register's unsigned integer range.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 1 1 0 0 0 1 0 0 Pm Rdn
D U sf
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 1 1 0 0 0 1 1 0 Pm Rdn
D U sf
Assembler Symbols
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.
size <T>
00 B
01 H
10 S
11 D
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(ssize) operand1 = X[dn];
bits(PL) operand2 = P[m];
bits(ssize) result;
integer count = 0;
for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Counts the number of true elements in the source predicate and then uses the result to decrement all destination
vector elements. The results are saturated to the element unsigned integer range.
The predicate size specifier may be omitted in assembler source code, but this is deprecated and will be prohibited in
a future release of the architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 1 1 0 0 0 0 0 0 Pm Zdn
D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) operand2 = P[m];
bits(VL) result;
integer count = 0;
for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;
for e = 0 to elements-1
integer element = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element - count, esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating decrement scalar by multiple of 32-bit predicate constraint element count
Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating decrement vector by multiple of 32-bit predicate constraint element count
Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. The
results are saturated to the 32-bit unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 0 0 1 1 pattern Zdn
size<1>size<0> D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - (count * imm), esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating increment scalar by multiple of 8-bit predicate constraint element count
Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating increment scalar by multiple of 64-bit predicate constraint element count
Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating increment vector by multiple of 64-bit predicate constraint element count
Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. The
results are saturated to the 64-bit unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + (count * imm), esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating increment scalar by multiple of 16-bit predicate constraint element count
Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating increment vector by multiple of 16-bit predicate constraint element count
Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. The
results are saturated to the 16-bit unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + (count * imm), esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Counts the number of true elements in the source predicate and then uses the result to increment the scalar
destination. The result is saturated to the general-purpose register's unsigned integer range.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 1 1 0 0 0 1 0 0 Pm Rdn
D U sf
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 1 1 0 0 0 1 1 0 Pm Rdn
D U sf
Assembler Symbols
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.
size <T>
00 B
01 H
10 S
11 D
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(ssize) operand1 = X[dn];
bits(PL) operand2 = P[m];
bits(ssize) result;
integer count = 0;
for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Counts the number of true elements in the source predicate and then uses the result to increment all destination
vector elements. The results are saturated to the element unsigned integer range.
The predicate size specifier may be omitted in assembler source code, but this is deprecated and will be prohibited in
a future release of the architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 1 1 0 0 0 0 0 0 Pm Zdn
D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) operand2 = P[m];
bits(VL) result;
integer count = 0;
for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;
for e = 0 to elements-1
integer element = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element + count, esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating increment scalar by multiple of 32-bit predicate constraint element count
Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit
32-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U
64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U
Assembler Symbols
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
Operation
CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating increment vector by multiple of 32-bit predicate constraint element count
Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. The
results are saturated to the 32-bit unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL
<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.
CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + (count * imm), esize, unsigned);
Z[dn] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating subtract an unsigned immediate from each element of the source vector, and destructively place
the results in the corresponding elements of the source vector. Each result element is saturated to the N-bit element's
unsigned integer range 0 to (2N)-1. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 1 1 1 1 1 sh imm8 Zdn
U
Assembler Symbols
<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.
size <T>
00 B
01 H
10 S
11 D
<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.
<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:
sh <shift>
0 LSL #0
1 LSL #8
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - imm, esize, unsigned);
Z[dn] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unsigned saturating subtract all elements of the second source vector from corresponding elements of the first source
vector and place the results in the corresponding elements of the destination vector. Each result element is saturated
to the N-bit element's unsigned integer range 0 to (2N)-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 1 1 1 Zn Zd
U
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;
for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - element2, esize, unsigned);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The unsigned by signed integer indexed dot product instruction computes the dot product of a group of four unsigned
8-bit integer values held in each 32-bit element of the first source vector multiplied by a group of four signed 8-bit
integer values in an indexed 32-bit element of the second source vector, and then destructively adds the widened dot
product to the corresponding 32-bit element of the destination vector.
The groups within the second source vector are specified using an immediate index which selects the same group
position within each 128-bit vector segment. The index range is from 0 to 3. This instruction is unpredicated.
ID_AA64ZFR0_EL1.I8MM indicates whether this instruction is implemented.
SVE
(FEAT_I8MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 1 1 0 Zn Zda
size<1>size<0> U
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register Z0-Z7, encoded in the "Zm" field.
<imm> Is the immediate index of a quadtuplet of four 8-bit elements within each 128-bit vector segment, in the
range 0 to 3, encoded in the "i2" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = UInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = SInt(Elem[operand2, 4 * s + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;
Z[da] = result;
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The unsigned by signed integer dot product instruction computes the dot product of a group of four unsigned 8-bit
integer values held in each 32-bit element of the first source vector multiplied by a group of four signed 8-bit integer
values in the corresponding 32-bit element of the second source vector, and then destructively adds the widened dot
product to the corresponding 32-bit element of the destination vector.
This instruction is unpredicated.
ID_AA64ZFR0_EL1.I8MM indicates whether this instruction is implemented.
SVE
(FEAT_I8MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 0 Zm 0 1 1 1 1 0 Zn Zda
size<1>size<0>
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
for e = 0 to elements-1
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = UInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = SInt(Elem[operand2, 4 * e + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
The unsigned by signed integer matrix multiply-accumulate instruction multiplies the 2×8 matrix of unsigned 8-bit
integer values held in each 128-bit segment of the first source vector by the 8×2 matrix of signed 8-bit integer values
in the corresponding segment of the second source vector. The resulting 2×2 widened 32-bit integer matrix product is
then destructively added to the 32-bit integer matrix accumulator held in the corresponding segment of the addend
and destination vector. This is equivalent to performing an 8-way dot product per destination element.
This instruction is unpredicated.
ID_AA64ZFR0_EL1.I8MM indicates whether this instruction is implemented.
SVE
(FEAT_I8MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 1 1 0 0 Zm 1 0 0 1 1 0 Zn Zda
uns<1>uns<0>
Assembler Symbols
<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
integer segments = VL DIV 128;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result = Zeros();
bits(128) op1, op2;
bits(128) res, addend;
for s = 0 to segments-1
op1 = Elem[operand1, s, 128];
op2 = Elem[operand2, s, 128];
addend = Elem[operand3, s, 128];
res = MatMulAdd(addend, op1, op2, op1_unsigned, op2_unsigned);
Elem[result, s, 128] = res;
Z[da] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Unpack elements from the lowest or highest half of the source vector and then zero-extend them to place in elements
of twice their size within the destination vector. This instruction is unpredicated.
It has encodings from 2 classes: High half and Low half
High half
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 1 1 0 0 1 1 1 0 Zn Zd
U H
Low half
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 1 0 0 0 1 1 1 0 Zn Zd
U H
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 RESERVED
01 H
10 S
11 D
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
size <Tb>
00 RESERVED
01 B
10 H
11 S
CheckSVEEnabled();
integer elements = VL DIV esize;
integer hsize = esize DIV 2;
bits(VL) operand = Z[n];
bits(VL) result;
for e = 0 to elements-1
bits(hsize) element = if hi then Elem[operand, e + elements, hsize] else Elem[operand, e, hsize];
Elem[result, e, esize] = Extend(element, esize, unsigned);
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Zero-extend the least-significant sub-element of each active element of the source vector, and place the results in the
corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
It has encodings from 3 classes: Byte , Halfword and Word
Byte
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 1 1 0 1 Pg Zn Zd
U
Halfword
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 1 1 0 1 Pg Zn Zd
U
Word
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 1 1 0 1 Pg Zn Zd
U
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<T> For the byte variant: is the size specifier, encoded in “size”:
size <T>
00 RESERVED
01 H
10 S
11 D
size<0> <T>
0 S
1 D
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
Operation
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = Extend(element<s_esize-1:0>, esize, unsigned);
Z[d] = result;
Operational information
This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Concatenate adjacent even or odd-numbered elements from the first and second source predicates and place in
elements of the destination predicate. This instruction is unpredicated.
It has encodings from 2 classes: Even and Odd
Even
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 0 1 0 0 Pn 0 Pd
H
Odd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 0 1 1 0 Pn 0 Pd
H
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 B
01 H
10 S
11 D
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
CheckSVEEnabled();
integer pairs = VL DIV (esize * 2);
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
for p = 0 to pairs - 1
Elem[result, p, esize DIV 8] = Elem[operand1, 2*p+part, esize DIV 8];
for p = 0 to pairs - 1
Elem[result, pairs+p, esize DIV 8] = Elem[operand2, 2*p+part, esize DIV 8];
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Concatenate adjacent even or odd-numbered elements from the first and second source vectors and place in elements
of the destination vector. This instruction is unpredicated. The 128-bit element variant of this instruction requires that
the current vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then
the trailing bits are set to zero.
ID_AA64ZFR0_EL1.F64MM indicates whether the 128-bit element variant of the instruction is implemented.
It has encodings from 4 classes: Even , Even (quadwords) , Odd and Odd (quadwords)
Even
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 0 1 0 Zn Zd
H
Even (quadwords)
(FEAT_F64MM)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 0 1 0 Zn Zd
H
Odd
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 0 1 1 Zn Zd
H
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 0 1 1 Zn Zd
H
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
if VL < esize * 2 then UNDEFINED;
integer pairs = VL DIV (esize * 2);
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result = Zeros();
for p = 0 to pairs - 1
Elem[result, p, esize] = Elem[operand1, 2*p+part, esize];
for p = 0 to pairs - 1
Elem[result, pairs+p, esize] = Elem[operand2, 2*p+part, esize];
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first,
signed scalar operand is less than or equal to the second scalar operand and false thereafter up to the highest
numbered element.
If the second scalar operand is equal to the maximum signed integer value then a condition which includes an equality
test can never fail and the result will be an all-true predicate.
The full width of the scalar operands is significant for the purposes of comparison, and the full width first operand is
incremented by one for each destination predicate element, irrespective of the predicate result element size. The first
general-purpose source register is not itself updated.
The predicate result is placed in the predicate destination register. Sets the FIRST (N), NONE (Z), !LAST (C) condition
flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf 0 1 Rn 1 Pd
U lt eq
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 B
01 H
10 S
11 D
sf <R>
0 W
1 X
<n> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rn"
field.
<m> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rm"
field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = Ones(PL);
bits(rsize) operand1 = X[n];
bits(rsize) operand2 = X[m];
bits(PL) result;
boolean last = TRUE;
for e = 0 to elements-1
boolean cond;
case op of
when Cmp_LT cond = (Int(operand1, unsigned) < Int(operand2, unsigned));
when Cmp_LE cond = (Int(operand1, unsigned) <= Int(operand2, unsigned));
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first,
unsigned scalar operand is lower than the second scalar operand and false thereafter up to the highest numbered
element.
The full width of the scalar operands is significant for the purposes of comparison, and the full width first operand is
incremented by one for each destination predicate element, irrespective of the predicate result element size. The first
general-purpose source register is not itself updated.
The predicate result is placed in the predicate destination register. Sets the FIRST (N), NONE (Z), !LAST (C) condition
flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf 1 1 Rn 0 Pd
U lt eq
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 B
01 H
10 S
11 D
sf <R>
0 W
1 X
<n> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rn"
field.
<m> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rm"
field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = Ones(PL);
bits(rsize) operand1 = X[n];
bits(rsize) operand2 = X[m];
bits(PL) result;
boolean last = TRUE;
for e = 0 to elements-1
boolean cond;
case op of
when Cmp_LT cond = (Int(operand1, unsigned) < Int(operand2, unsigned));
when Cmp_LE cond = (Int(operand1, unsigned) <= Int(operand2, unsigned));
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first,
unsigned scalar operand is lower or same as the second scalar operand and false thereafter up to the highest
numbered element.
If the second scalar operand is equal to the maximum unsigned integer value then a condition which includes an
equality test can never fail and the result will be an all-true predicate.
The full width of the scalar operands is significant for the purposes of comparison, and the full width first operand is
incremented by one for each destination predicate element, irrespective of the predicate result element size. The first
general-purpose source register is not itself updated.
The predicate result is placed in the predicate destination register. Sets the FIRST (N), NONE (Z), !LAST (C) condition
flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf 1 1 Rn 1 Pd
U lt eq
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 B
01 H
10 S
11 D
sf <R>
0 W
1 X
<n> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rn"
field.
<m> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rm"
field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = Ones(PL);
bits(rsize) operand1 = X[n];
bits(rsize) operand2 = X[m];
bits(PL) result;
boolean last = TRUE;
for e = 0 to elements-1
boolean cond;
case op of
when Cmp_LT cond = (Int(operand1, unsigned) < Int(operand2, unsigned));
when Cmp_LE cond = (Int(operand1, unsigned) <= Int(operand2, unsigned));
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first,
signed scalar operand is less than the second scalar operand and false thereafter up to the highest numbered element.
The full width of the scalar operands is significant for the purposes of comparison, and the full width first operand is
incremented by one for each destination predicate element, irrespective of the predicate result element size. The first
general-purpose source register is not itself updated.
The predicate result is placed in the predicate destination register. Sets the FIRST (N), NONE (Z), !LAST (C) condition
flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf 0 1 Rn 0 Pd
U lt eq
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 B
01 H
10 S
11 D
sf <R>
0 W
1 X
<n> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rn"
field.
<m> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rm"
field.
CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = Ones(PL);
bits(rsize) operand1 = X[n];
bits(rsize) operand2 = X[m];
bits(PL) result;
boolean last = TRUE;
for e = 0 to elements-1
boolean cond;
case op of
when Cmp_LT cond = (Int(operand1, unsigned) < Int(operand2, unsigned));
when Cmp_LE cond = (Int(operand1, unsigned) <= Int(operand2, unsigned));
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Read the source predicate register and place in the first-fault register (FFR). This instruction is intended to restore a
saved FFR and is not recommended for general use by applications.
This instruction requires that the source predicate contains a MONOTONIC predicate value, in which starting from bit 0
there are zero or more 1 bits, followed only by 0 bits in any remaining bit positions. If the source is not a monotonic
predicate value, then the resulting value in the FFR will be UNPREDICTABLE. It is not possible to generate a non-
monotonic value in FFR when using SETFFR followed by first-fault or non-fault loads.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 Pn 0 0 0 0 0
WRFFR <Pn>.B
Assembler Symbols
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.
Operation
CheckSVEEnabled();
bits(PL) operand = P[n];
hsb = HighestSetBit(operand);
if hsb < 0 || IsOnes(operand<hsb:0>) then
FFR[] = operand;
else // not a monotonic predicate
FFR[] = bits(PL) UNKNOWN;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Interleave alternating elements from the lowest or highest halves of the first and second source predicates and place
in elements of the destination predicate. This instruction is unpredicated.
It has encodings from 2 classes: High halves and Low halves
High halves
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 0 0 1 0 Pn 0 Pd
H
Low halves
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 0 0 0 0 Pn 0 Pd
H
Assembler Symbols
<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
size <T>
00 B
01 H
10 S
11 D
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.
CheckSVEEnabled();
integer pairs = VL DIV (esize * 2);
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
P[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Interleave alternating elements from the lowest or highest halves of the first and second source vectors and place in
elements of the destination vector. This instruction is unpredicated. The 128-bit element variant of this instruction
requires that the current vector length is at least 256 bits, and if the current vector length is not an integer multiple of
256 bits then the trailing bits are set to zero.
ID_AA64ZFR0_EL1.F64MM indicates whether the 128-bit element variant of the instruction is implemented.
It has encodings from 4 classes: High halves , High halves (quadwords) , Low halves and Low halves (quadwords)
High halves
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 0 0 1 Zn Zd
H
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 0 0 1 Zn Zd
H
Low halves
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 0 0 0 Zn Zd
H
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 0 0 0 Zn Zd
H
Assembler Symbols
<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
size <T>
00 B
01 H
10 S
11 D
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
Operation
CheckSVEEnabled();
if VL < esize * 2 then UNDEFINED;
integer pairs = VL DIV (esize * 2);
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result = Zeros();
Z[d] = result;
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.
Decode fields
Instruction details
op0
0000 Reserved
0001 UNALLOCATED
0010 SVE encodings
0011 UNALLOCATED
100x Data Processing -- Immediate
101x Branches, Exception Generating and System instructions
x1x0 Loads and Stores
x101 Data Processing -- Register
x111 Data Processing -- Scalar Floating-Point and Advanced SIMD
Reserved
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op0 0000 op1
Decode fields
Instruction details
op0 op1
000 000000000 UDF
!= 000000000 UNALLOCATED
!= 000 UNALLOCATED
SVE encodings
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op0 0010 op1 op2 op3
Decode fields
Instruction details
op0 op1 op2 op3
000 0x 0xxxx x1xxxx SVE Integer Multiply-Add - Predicated
000 0x 0xxxx 000xxx SVE Integer Binary Arithmetic - Predicated
000 0x 0xxxx 001xxx SVE Integer Reduction
000 0x 0xxxx 100xxx SVE Bitwise Shift - Predicated
000 0x 0xxxx 101xxx SVE Integer Unary Arithmetic - Predicated
000 0x 1xxxx 000xxx SVE integer add/subtract vectors (unpredicated)
000 0x 1xxxx 001xxx SVE Bitwise Logical - Unpredicated
000 0x 1xxxx 0100xx SVE Index Generation
000 0x 1xxxx 0101xx SVE Stack Allocation
000 0x 1xxxx 011xxx UNALLOCATED
000 0x 1xxxx 100xxx SVE Bitwise Shift - Unpredicated
000 0x 1xxxx 1010xx SVE address generation
000 0x 1xxxx 1011xx SVE Integer Misc - Unpredicated
Page 2597
Top-level encodings for A64
Page 2598
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 1
Decode fields
Instruction details
op0
0 SVE integer multiply-accumulate writing addend (predicated)
1 SVE integer multiply-add writing multiplicand (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 0 1 op Pg Zn Zda
Page 2599
Top-level encodings for A64
Decode fields
Instruction Details
op
0 MLA
1 MLS
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 1 1 op Pg Za Zdn
Decode fields
Instruction Details
op
0 MAD
1 MSB
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 000
Decode fields
Instruction details
op0
00x SVE integer add/subtract vectors (predicated)
01x SVE integer min/max/difference (predicated)
100 SVE integer multiply vectors (predicated)
101 SVE integer divide vectors (predicated)
11x SVE bitwise logical operations (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 opc 0 0 0 Pg Zm Zdn
Decode fields
Instruction Details
opc
000 ADD (vectors, predicated)
001 SUB (vectors, predicated)
010 UNALLOCATED
011 SUBR (vectors)
1xx UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 opc U 0 0 0 Pg Zm Zdn
Page 2600
Top-level encodings for A64
Decode fields
Instruction Details
opc U
00 0 SMAX (vectors)
00 1 UMAX (vectors)
01 0 SMIN (vectors)
01 1 UMIN (vectors)
10 0 SABD
10 1 UABD
11 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 H U 0 0 0 Pg Zm Zdn
Decode fields
Instruction Details
H U
0 0 MUL (vectors)
0 1 UNALLOCATED
1 0 SMULH
1 1 UMULH
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 R U 0 0 0 Pg Zm Zdn
Decode fields
Instruction Details
R U
0 0 SDIV
0 1 UDIV
1 0 SDIVR
1 1 UDIVR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 opc 0 0 0 Pg Zm Zdn
Decode fields
Instruction Details
opc
000 ORR (vectors, predicated)
001 EOR (vectors, predicated)
010 AND (vectors, predicated)
011 BIC (vectors, predicated)
1xx UNALLOCATED
Page 2601
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 001
Decode fields
Instruction details
op0
000 SVE integer add reduction (predicated)
010 SVE integer min/max reduction (predicated)
0x1 UNALLOCATED
10x SVE constructive prefix (predicated)
110 SVE bitwise logical reduction (predicated)
111 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 op U 0 0 1 Pg Zn Vd
Decode fields
Instruction Details
op U
0 0 SADDV
0 1 UADDV
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 op U 0 0 1 Pg Zn Vd
Decode fields
Instruction Details
op U
0 0 SMAXV
0 1 UMAXV
1 0 SMINV
1 1 UMINV
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 opc M 0 0 1 Pg Zn Zd
Decode fields
Instruction Details
opc
00 MOVPRFX (predicated)
01 UNALLOCATED
1x UNALLOCATED
Page 2602
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 opc 0 0 1 Pg Zn Vd
Decode fields
Instruction Details
opc
00 ORV
01 EORV
10 ANDV
11 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 100
Decode fields
Instruction details
op0
0x SVE bitwise shift by immediate (predicated)
10 SVE bitwise shift by vector (predicated)
11 SVE bitwise shift by wide elements (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 0 0 opc L U 1 0 0 Pg tszl imm3 Zdn
Decode fields
Instruction Details
opc L U
00 0 0 ASR (immediate, predicated)
00 0 1 LSR (immediate, predicated)
00 1 0 UNALLOCATED
00 1 1 LSL (immediate, predicated)
01 0 0 ASRD
01 0 1 UNALLOCATED
01 1 UNALLOCATED
1x UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 R L U 1 0 0 Pg Zm Zdn
Decode fields
Instruction Details
R L U
1 0 UNALLOCATED
0 0 0 ASR (vectors)
Page 2603
Top-level encodings for A64
Decode fields
Instruction Details
R L U
0 0 1 LSR (vectors)
0 1 1 LSL (vectors)
1 0 0 ASRR
1 0 1 LSRR
1 1 1 LSLR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 R L U 1 0 0 Pg Zm Zdn
Decode fields
Instruction Details
R L U
0 0 0 ASR (wide elements, predicated)
0 0 1 LSR (wide elements, predicated)
0 1 0 UNALLOCATED
0 1 1 LSL (wide elements, predicated)
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 101
Decode fields
Instruction details
op0
0x UNALLOCATED
10 SVE integer unary operations (predicated)
11 SVE bitwise unary operations (predicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 opc 1 0 1 Pg Zn Zd
Decode fields
Instruction Details
opc
000 SXTB, SXTH, SXTW — SXTB
001 UXTB, UXTH, UXTW — UXTB
010 SXTB, SXTH, SXTW — SXTH
011 UXTB, UXTH, UXTW — UXTH
100 SXTB, SXTH, SXTW — SXTW
101 UXTB, UXTH, UXTW — UXTW
110 ABS
111 NEG
Page 2604
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 opc 1 0 1 Pg Zn Zd
Decode fields
Instruction Details
opc
000 CLS
001 CLZ
010 CNT
011 CNOT
100 FABS
101 FNEG
110 NOT (vector)
111 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 opc Zn Zd
Decode fields
Instruction Details
opc
000 ADD (vectors, unpredicated)
001 SUB (vectors, unpredicated)
01x UNALLOCATED
100 SQADD (vectors)
101 UQADD (vectors)
110 SQSUB (vectors)
111 UQSUB (vectors)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 001 op0 op1
Decode fields
Instruction details
op0 op1
0 UNALLOCATED
1 00 SVE bitwise logical operations (unpredicated)
1 != 00 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 opc 1 Zm 0 0 1 1 0 0 Zn Zd
Page 2605
Top-level encodings for A64
Decode fields
Instruction Details
opc
00 AND (vectors, unpredicated)
01 ORR (vectors, unpredicated)
10 EOR (vectors, unpredicated)
11 BIC (vectors, unpredicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 0100 op0
Decode fields
Instruction details
op0
00 INDEX (immediates)
01 INDEX (scalar, immediate)
10 INDEX (immediate, scalar)
11 INDEX (scalars)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 op0 1 0101 op1
Decode fields
Instruction details
op0 op1
0 0 SVE stack frame adjustment
1 0 SVE stack frame size
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 op 1 Rn 0 1 0 1 0 imm6 Rd
Decode fields
Instruction Details
op
0 ADDVL
1 ADDPL
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 op 1 opc2 0 1 0 1 0 imm6 Rd
Page 2606
Top-level encodings for A64
Decode fields
Instruction Details
op opc2
0 0xxxx UNALLOCATED
0 10xxx UNALLOCATED
0 110xx UNALLOCATED
0 1110x UNALLOCATED
0 11110 UNALLOCATED
0 11111 RDVL
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 100 op0
Decode fields
Instruction details
op0
0 SVE bitwise shift by wide elements (unpredicated)
1 SVE bitwise shift by immediate (unpredicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 0 0 opc Zn Zd
Decode fields
Instruction Details
opc
00 ASR (wide elements, unpredicated)
01 LSR (wide elements, unpredicated)
10 UNALLOCATED
11 LSL (wide elements, unpredicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 1 tszl imm3 1 0 0 1 opc Zn Zd
Decode fields
Instruction Details
opc
00 ASR (immediate, unpredicated)
01 LSR (immediate, unpredicated)
10 UNALLOCATED
11 LSL (immediate, unpredicated)
Page 2607
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 opc 1 Zm 1 0 1 0 msz Zn Zd
Decode fields
Instruction Details
opc
00 ADR — Unpacked 32-bit signed offsets
01 ADR — Unpacked 32-bit unsigned offsets
1x ADR — Packed offsets
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 1011 op0
Decode fields
Instruction details
op0
0x SVE floating-point trig select coefficient
10 SVE floating-point exponential accelerator
11 SVE constructive prefix (unpredicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 1 1 0 op Zn Zd
Decode fields
Instruction Details
op
0 FTSSEL
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 opc 1 0 1 1 1 0 Zn Zd
Decode fields
Instruction Details
opc
00000 FEXPA
00001 UNALLOCATED
0001x UNALLOCATED
001xx UNALLOCATED
01xxx UNALLOCATED
1xxxx UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 opc 1 opc2 1 0 1 1 1 1 Zn Zd
Page 2608
Top-level encodings for A64
Decode fields
Instruction Details
opc opc2
00 00000 MOVPRFX (unpredicated)
00 00001 UNALLOCATED
00 0001x UNALLOCATED
00 001xx UNALLOCATED
00 01xxx UNALLOCATED
00 1xxxx UNALLOCATED
01 UNALLOCATED
1x UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 op0 11 op1
Decode fields
Instruction details
op0 op1
0 00x SVE saturating inc/dec vector by element count
0 100 SVE element count
0 101 UNALLOCATED
1 000 SVE inc/dec vector by element count
1 100 SVE inc/dec register by element count
1 x01 UNALLOCATED
01x UNALLOCATED
11x SVE saturating inc/dec register by element count
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 0 imm4 1 1 0 0 D U pattern Zdn
Decode fields
Instruction Details
size D U
00 UNALLOCATED
01 0 0 SQINCH (vector)
01 0 1 UQINCH (vector)
01 1 0 SQDECH (vector)
01 1 1 UQDECH (vector)
10 0 0 SQINCW (vector)
10 0 1 UQINCW (vector)
10 1 0 SQDECW (vector)
10 1 1 UQDECW (vector)
11 0 0 SQINCD (vector)
11 0 1 UQINCD (vector)
11 1 0 SQDECD (vector)
11 1 1 UQDECD (vector)
Page 2609
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 0 imm4 1 1 1 0 0 op pattern Rd
Decode fields
Instruction Details
size op
1 UNALLOCATED
00 0 CNTB, CNTD, CNTH, CNTW — CNTB
01 0 CNTB, CNTD, CNTH, CNTW — CNTH
10 0 CNTB, CNTD, CNTH, CNTW — CNTW
11 0 CNTB, CNTD, CNTH, CNTW — CNTD
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 1 imm4 1 1 0 0 0 D pattern Zdn
Decode fields
Instruction Details
size D
00 UNALLOCATED
01 0 INCD, INCH, INCW (vector) — INCH
01 1 DECD, DECH, DECW (vector) — DECH
10 0 INCD, INCH, INCW (vector) — INCW
10 1 DECD, DECH, DECW (vector) — DECW
11 0 INCD, INCH, INCW (vector) — INCD
11 1 DECD, DECH, DECW (vector) — DECD
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 1 imm4 1 1 1 0 0 D pattern Rdn
Decode fields
Instruction Details
size D
00 0 INCB, INCD, INCH, INCW (scalar) — INCB
00 1 DECB, DECD, DECH, DECW (scalar) — DECB
01 0 INCB, INCD, INCH, INCW (scalar) — INCH
01 1 DECB, DECD, DECH, DECW (scalar) — DECH
10 0 INCB, INCD, INCH, INCW (scalar) — INCW
10 1 DECB, DECD, DECH, DECW (scalar) — DECW
11 0 INCB, INCD, INCH, INCW (scalar) — INCD
11 1 DECB, DECD, DECH, DECW (scalar) — DECD
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 sf imm4 1 1 1 1 D U pattern Rdn
Page 2610
Top-level encodings for A64
Decode fields
Instruction Details
size sf D U
00 0 0 0 SQINCB — 32-bit
00 0 0 1 UQINCB — 32-bit
00 0 1 0 SQDECB — 32-bit
00 0 1 1 UQDECB — 32-bit
00 1 0 0 SQINCB — 64-bit
00 1 0 1 UQINCB — 64-bit
00 1 1 0 SQDECB — 64-bit
00 1 1 1 UQDECB — 64-bit
01 0 0 0 SQINCH (scalar) — 32-bit
01 0 0 1 UQINCH (scalar) — 32-bit
01 0 1 0 SQDECH (scalar) — 32-bit
01 0 1 1 UQDECH (scalar) — 32-bit
01 1 0 0 SQINCH (scalar) — 64-bit
01 1 0 1 UQINCH (scalar) — 64-bit
01 1 1 0 SQDECH (scalar) — 64-bit
01 1 1 1 UQDECH (scalar) — 64-bit
10 0 0 0 SQINCW (scalar) — 32-bit
10 0 0 1 UQINCW (scalar) — 32-bit
10 0 1 0 SQDECW (scalar) — 32-bit
10 0 1 1 UQDECW (scalar) — 32-bit
10 1 0 0 SQINCW (scalar) — 64-bit
10 1 0 1 UQINCW (scalar) — 64-bit
10 1 1 0 SQDECW (scalar) — 64-bit
10 1 1 1 UQDECW (scalar) — 64-bit
11 0 0 0 SQINCD (scalar) — 32-bit
11 0 0 1 UQINCD (scalar) — 32-bit
11 0 1 0 SQDECD (scalar) — 32-bit
11 0 1 1 UQDECD (scalar) — 32-bit
11 1 0 0 SQINCD (scalar) — 64-bit
11 1 0 1 UQINCD (scalar) — 64-bit
11 1 1 0 SQDECD (scalar) — 64-bit
11 1 1 1 UQDECD (scalar) — 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 op0 00 op1
Decode fields
Instruction details
op0 op1
11 00 DUPM
!= 11 00 SVE bitwise logical with immediate (unpredicated)
!= 00 UNALLOCATED
Page 2611
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 != 11 0 0 0 0 imm13 Zdn
opc
The following constraints also apply to this encoding: opc != 11 && opc != 11
Decode fields
Instruction Details
opc
00 ORR (immediate)
01 EOR (immediate)
10 AND (immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 01 op0
Decode fields
Instruction details
op0
0xx SVE copy integer immediate (predicated)
10x UNALLOCATED
110 FCPY
111 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 M sh imm8 Zd
Decode fields
Instruction Details
M
0 CPY (immediate, zeroing)
1 CPY (immediate, merging)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 1 op0 op1 001110
Decode fields
Instruction details
op0 op1
00 000 DUP (scalar)
00 100 INSR (scalar)
00 x10 UNALLOCATED
00 xx1 UNALLOCATED
01 UNALLOCATED
10 0xx SVE unpack vector elements
10 100 INSR (SIMD&FP scalar)
10 110 UNALLOCATED
Page 2612
Top-level encodings for A64
10 1x1 UNALLOCATED
11 000 REV (vector)
11 != 000 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 U H 0 0 1 1 1 0 Zn Zd
Decode fields
Instruction Details
U H
0 0 SUNPKHI, SUNPKLO — SUNPKLO
0 1 SUNPKHI, SUNPKLO — SUNPKHI
1 0 UUNPKHI, UUNPKLO — UUNPKLO
1 1 UUNPKHI, UUNPKLO — UUNPKHI
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 op0 1 op1 010 op2 op3
Decode fields
Instruction details
op0 op1 op2 op3
00 1000x 0000 0 SVE unpack predicate elements
01 1000x 0000 0 UNALLOCATED
10 1000x 0000 0 UNALLOCATED
11 1000x 0000 0 UNALLOCATED
0xxxx xxx0 0 SVE permute predicate elements
0xxxx xxx1 0 UNALLOCATED
10100 0000 0 REV (predicate)
10101 0000 0 UNALLOCATED
10x0x 1000 0 UNALLOCATED
10x0x x100 0 UNALLOCATED
10x0x xx10 0 UNALLOCATED
10x0x xxx1 0 UNALLOCATED
10x1x 0 UNALLOCATED
11xxx 0 UNALLOCATED
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 H 0 1 0 0 0 0 0 Pn 0 Pd
Decode fields
Instruction Details
H
0 PUNPKHI, PUNPKLO — PUNPKLO
Page 2613
Top-level encodings for A64
Decode fields
Instruction Details
H
1 PUNPKHI, PUNPKLO — PUNPKHI
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 opc H 0 Pn 0 Pd
Decode fields
Instruction Details
opc H
00 0 ZIP1, ZIP2 (predicates) — ZIP1
00 1 ZIP1, ZIP2 (predicates) — ZIP2
01 0 UZP1, UZP2 (predicates) — UZP1
01 1 UZP1, UZP2 (predicates) — UZP2
10 0 TRN1, TRN2 (predicates) — TRN1
10 1 TRN1, TRN2 (predicates) — TRN2
11 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 opc Zn Zd
Decode fields
Instruction Details
opc
000 ZIP1, ZIP2 (vectors) — ZIP1
001 ZIP1, ZIP2 (vectors) — ZIP2
010 UZP1, UZP2 (vectors) — UZP1
011 UZP1, UZP2 (vectors) — UZP2
100 TRN1, TRN2 (vectors) — TRN1
101 TRN1, TRN2 (vectors) — TRN2
11x UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 1 op0 op1 op2 10 op3
Decode fields
Instruction details
op0 op1 op2 op3
0 000 0 0 CPY (SIMD&FP scalar)
0 000 1 0 COMPACT
0 000 1 SVE extract element to general register
0 001 0 SVE extract element to SIMD&FP scalar register
0 01x 0 SVE reverse within elements
0 01x 1 UNALLOCATED
0 100 0 1 CPY (scalar)
Page 2614
Top-level encodings for A64
0 100 1 1 UNALLOCATED
0 100 0 SVE conditionally broadcast element to vector
0 101 0 SVE conditionally extract element to SIMD&FP scalar
0 110 0 0 SPLICE
0 110 0 1 UNALLOCATED
0 110 1 UNALLOCATED
0 111 0 0 UNALLOCATED
0 111 0 1 UNALLOCATED
0 111 1 UNALLOCATED
0 x01 1 UNALLOCATED
1 000 0 UNALLOCATED
1 000 1 SVE conditionally extract element to general register
1 != 000 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 B 1 0 1 Pg Zn Rd
Decode fields
Instruction Details
B
0 LASTA (scalar)
1 LASTB (scalar)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 1 B 1 0 0 Pg Zn Vd
Decode fields
Instruction Details
B
0 LASTA (SIMD&FP scalar)
1 LASTB (SIMD&FP scalar)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 opc 1 0 0 Pg Zn Zd
Decode fields
Instruction Details
opc
00 REVB, REVH, REVW — REVB
01 REVB, REVH, REVW — REVH
10 REVB, REVH, REVW — REVW
11 RBIT
Page 2615
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 B 1 0 0 Pg Zm Zdn
Decode fields
Instruction Details
B
0 CLASTA (vectors)
1 CLASTB (vectors)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 1 B 1 0 0 Pg Zm Vdn
Decode fields
Instruction Details
B
0 CLASTA (SIMD&FP scalar)
1 CLASTB (SIMD&FP scalar)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 B 1 0 1 Pg Zm Rdn
Decode fields
Instruction Details
B
0 CLASTA (scalar)
1 CLASTB (scalar)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
000001010 op0 1 000
Decode fields
Instruction details
op0
0 EXT
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
000001011 op0 1 000
Decode fields
Instruction details
op0
Page 2616
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 opc H Zn Zd
Decode fields
Instruction Details Feature
opc H
00 0 ZIP1, ZIP2 (vectors) — ZIP1 FEAT_F64MM
00 1 ZIP1, ZIP2 (vectors) — ZIP2 FEAT_F64MM
01 0 UZP1, UZP2 (vectors) — UZP1 FEAT_F64MM
01 1 UZP1, UZP2 (vectors) — UZP2 FEAT_F64MM
10 UNALLOCATED -
11 0 TRN1, TRN2 (vectors) — TRN1 FEAT_F64MM
11 1 TRN1, TRN2 (vectors) — TRN2 FEAT_F64MM
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100100 0 op0
Decode fields
Instruction details
op0
0 SVE integer compare vectors
1 SVE integer compare with wide elements
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm op 0 o2 Pg Zn ne Pd
Decode fields
Instruction Details
op o2 ne
0 0 0 CMP<cc> (vectors) — CMPHS
0 0 1 CMP<cc> (vectors) — CMPHI
0 1 0 CMP<cc> (wide elements) — CMPEQ
0 1 1 CMP<cc> (wide elements) — CMPNE
1 0 0 CMP<cc> (vectors) — CMPGE
1 0 1 CMP<cc> (vectors) — CMPGT
1 1 0 CMP<cc> (vectors) — CMPEQ
1 1 1 CMP<cc> (vectors) — CMPNE
Page 2617
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm U 1 lt Pg Zn ne Pd
Decode fields
Instruction Details
U lt ne
0 0 0 CMP<cc> (wide elements) — CMPGE
0 0 1 CMP<cc> (wide elements) — CMPGT
0 1 0 CMP<cc> (wide elements) — CMPLT
0 1 1 CMP<cc> (wide elements) — CMPLE
1 0 0 CMP<cc> (wide elements) — CMPHS
1 0 1 CMP<cc> (wide elements) — CMPHI
1 1 0 CMP<cc> (wide elements) — CMPLO
1 1 1 CMP<cc> (wide elements) — CMPLS
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 lt Pg Zn ne Pd
Decode fields
Instruction Details
lt ne
0 0 CMP<cc> (immediate) — CMPHS
0 1 CMP<cc> (immediate) — CMPHI
1 0 CMP<cc> (immediate) — CMPLO
1 1 CMP<cc> (immediate) — CMPLS
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 op 0 o2 Pg Zn ne Pd
Decode fields
Instruction Details
op o2 ne
0 0 0 CMP<cc> (immediate) — CMPGE
0 0 1 CMP<cc> (immediate) — CMPGT
0 1 0 CMP<cc> (immediate) — CMPLT
0 1 1 CMP<cc> (immediate) — CMPLE
1 0 0 CMP<cc> (immediate) — CMPEQ
1 0 1 CMP<cc> (immediate) — CMPNE
1 1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 0 Pm 0 1 Pg o2 Pn o3 Pd
Decode fields
Instruction Details
op S o2 o3
0 0 0 0 AND (predicates)
Page 2618
Top-level encodings for A64
Decode fields
Instruction Details
op S o2 o3
0 0 0 1 BIC (predicates)
0 0 1 0 EOR (predicates)
0 0 1 1 SEL (predicates)
0 1 0 0 ANDS
0 1 0 1 BICS
0 1 1 0 EORS
0 1 1 1 UNALLOCATED
1 0 0 0 ORR (predicates)
1 0 0 1 ORN (predicates)
1 0 1 0 NOR
1 0 1 1 NAND
1 1 0 0 ORRS
1 1 0 1 ORNS
1 1 1 0 NORS
1 1 1 1 NANDS
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 00 11 op0
Decode fields
Instruction details
op0
0 SVE propagate break from previous partition
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 0 Pm 1 1 Pg 0 Pn B Pd
Decode fields
Instruction Details
op S B
0 0 0 BRKPA
0 0 1 BRKPB
0 1 0 BRKPAS
0 1 1 BRKPBS
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 op0 01 op1 01 op2 op3
Page 2619
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 S 0 1 1 0 0 0 0 1 Pg 0 Pn 0 Pdm
Decode fields
Instruction Details
S
0 BRKN
1 BRKNS
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 B S 0 1 0 0 0 0 0 1 Pg 0 Pn M Pd
Decode fields
Instruction Details
B S M
1 1 UNALLOCATED
0 0 BRKA
0 1 0 BRKAS
1 0 BRKB
1 1 0 BRKBS
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 01 op0 11 op1 op2 op3 op4
Decode fields
Instruction details
op0 op1 op2 op3 op4
0000 x0 0 SVE predicate test
0100 x0 0 UNALLOCATED
0x10 x0 0 UNALLOCATED
0xx1 x0 0 UNALLOCATED
0xxx x1 0 UNALLOCATED
1000 000 00 0 SVE predicate first active
Page 2620
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 0 0 0 0 1 1 Pg 0 Pn 0 opc2
Decode fields
Instruction Details
op S opc2
0 0 UNALLOCATED
0 1 0000 PTEST
0 1 0001 UNALLOCATED
0 1 001x UNALLOCATED
0 1 01xx UNALLOCATED
0 1 1xxx UNALLOCATED
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 1 0 0 0 1 1 0 0 0 0 0 Pg 0 Pdn
Decode fields
Instruction Details
op S
0 0 UNALLOCATED
0 1 PFIRST
1 UNALLOCATED
Page 2621
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 1 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 Pd
Decode fields
Instruction Details
op S
0 0 PFALSE
0 1 UNALLOCATED
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 1 0 0 0 1 1 1 1 0 0 0 Pg 0 Pd
Decode fields
Instruction Details
op S
0 0 RDFFR (predicated)
0 1 RDFFRS
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 Pd
Decode fields
Instruction Details
op S
0 0 RDFFR (unpredicated)
0 1 UNALLOCATED
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 1 1 0 0 S 1 1 1 0 0 0 pattern 0 Pd
Decode fields
Instruction Details
S
0 PTRUE
1 PTRUES
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 1 00 op0 op1 op2
Decode fields
Instruction details
op0 op1 op2
0 SVE integer compare scalar count and limit
Page 2622
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf U lt Rn eq Pd
Decode fields
Instruction Details
U lt eq
0 UNALLOCATED
0 1 0 WHILELT
0 1 1 WHILELE
1 1 0 WHILELO
1 1 1 WHILELS
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op sz 1 Rm 0 0 1 0 0 0 Rn ne 0 0 0 0
Decode fields
Instruction Details
op ne
0 UNALLOCATED
1 0 CTERMEQ, CTERMNE — CTERMEQ
1 1 CTERMEQ, CTERMNE — CTERMNE
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 1 op0 op1 11
Decode fields
Instruction details
op0 op1
00 SVE integer add/subtract immediate (unpredicated)
01 SVE integer min/max immediate (unpredicated)
10 SVE integer multiply immediate (unpredicated)
11 0 SVE broadcast integer immediate (unpredicated)
11 1 SVE broadcast floating-point immediate (unpredicated)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 opc 1 1 sh imm8 Zdn
Page 2623
Top-level encodings for A64
Decode fields
Instruction Details
opc
000 ADD (immediate)
001 SUB (immediate)
010 UNALLOCATED
011 SUBR (immediate)
100 SQADD (immediate)
101 UQADD (immediate)
110 SQSUB (immediate)
111 UQSUB (immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 opc 1 1 o2 imm8 Zdn
Decode fields
Instruction Details
opc o2
0xx 1 UNALLOCATED
000 0 SMAX (immediate)
001 0 UMAX (immediate)
010 0 SMIN (immediate)
011 0 UMIN (immediate)
1xx UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 0 opc 1 1 o2 imm8 Zdn
Decode fields
Instruction Details
opc o2
000 0 MUL (immediate)
000 1 UNALLOCATED
001 UNALLOCATED
01x UNALLOCATED
1xx UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 opc 0 1 1 sh imm8 Zd
Decode fields
Instruction Details
opc
00 DUP (immediate)
01 UNALLOCATED
1x UNALLOCATED
Page 2624
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 opc 1 1 1 o2 imm8 Zd
Decode fields
Instruction Details
opc o2
00 0 FDUP
00 1 UNALLOCATED
01 UNALLOCATED
1x UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 100 10 op0
Decode fields
Instruction details
op0
0 SVE predicate count
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 opc 1 0 Pg 0 Pn Rd
Decode fields
Instruction Details
opc
000 CNTP
001 UNALLOCATED
01x UNALLOCATED
1xx UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 101 op0 1000 op1
Decode fields
Instruction details
op0 op1
0 0 SVE saturating inc/dec vector by predicate count
0 1 SVE saturating inc/dec register by predicate count
1 0 SVE inc/dec vector by predicate count
1 1 SVE inc/dec register by predicate count
Page 2625
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 D U 1 0 0 0 0 opc Pm Zdn
Decode fields
Instruction Details
D U opc
01 UNALLOCATED
1x UNALLOCATED
0 0 00 SQINCP (vector)
0 1 00 UQINCP (vector)
1 0 00 SQDECP (vector)
1 1 00 UQDECP (vector)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 D U 1 0 0 0 1 sf op Pm Rdn
Decode fields
Instruction Details
D U sf op
1 UNALLOCATED
0 0 0 0 SQINCP (scalar) — 32-bit
0 0 1 0 SQINCP (scalar) — 64-bit
0 1 0 0 UQINCP (scalar) — 32-bit
0 1 1 0 UQINCP (scalar) — 64-bit
1 0 0 0 SQDECP (scalar) — 32-bit
1 0 1 0 SQDECP (scalar) — 64-bit
1 1 0 0 UQDECP (scalar) — 32-bit
1 1 1 0 UQDECP (scalar) — 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 op D 1 0 0 0 0 opc2 Pm Zdn
Decode fields
Instruction Details
op D opc2
0 01 UNALLOCATED
0 1x UNALLOCATED
0 0 00 INCP (vector)
0 1 00 DECP (vector)
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 op D 1 0 0 0 1 opc2 Pm Rdn
Page 2626
Top-level encodings for A64
Decode fields
Instruction Details
op D opc2
0 01 UNALLOCATED
0 1x UNALLOCATED
0 0 00 INCP (scalar)
0 1 00 DECP (scalar)
1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 101 op0 op1 1001 op2 op3 op4
Decode fields
Instruction details
op0 op1 op2 op3 op4
0 00 000 00000 SVE FFR write from predicate
1 00 000 0000 00000 SVE FFR initialise
1 00 000 1xxx 00000 UNALLOCATED
1 00 000 x1xx 00000 UNALLOCATED
1 00 000 xx1x 00000 UNALLOCATED
1 00 000 xxx1 00000 UNALLOCATED
00 000 != 00000 UNALLOCATED
00 != 000 UNALLOCATED
!= 00 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 opc 1 0 1 0 0 0 1 0 0 1 0 0 0 Pn 0 0 0 0 0
Decode fields
Instruction Details
opc
00 WRFFR
01 UNALLOCATED
1x UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 opc 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
Decode fields
Instruction Details
opc
00 SETFFR
01 UNALLOCATED
1x UNALLOCATED
Page 2627
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01000100 0 0 op0 op1 op2
Decode fields
Instruction details
op0 op1 op2
0 000 SVE integer dot product (unpredicated)
0 != 000 UNALLOCATED
1 0xx UNALLOCATED
1 10x UNALLOCATED
1 110 UNALLOCATED
1 111 0 SVE mixed sign dot product
1 111 1 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 0 Zm 0 0 0 0 0 U Zn Zda
Decode fields
Instruction Details
U
0 SDOT (vectors)
1 UDOT (vectors)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 0 Zm 0 1 1 1 1 0 Zn Zda
Decode fields
Instruction Details Feature
size
0x UNALLOCATED -
10 USDOT (vectors) FEAT_I8MM
11 UNALLOCATED -
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01000100 1 op0 op1
Decode fields
Instruction details
op0 op1
000 00 SVE integer dot product (indexed)
000 01 UNALLOCATED
000 10 UNALLOCATED
000 11 SVE mixed sign dot product (indexed)
!= 000 UNALLOCATED
Page 2628
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 1 opc 0 0 0 0 0 U Zn Zda
Decode fields
Instruction Details
size U
0x UNALLOCATED
10 0 SDOT (indexed) — 32-bit
10 1 UDOT (indexed) — 32-bit
11 0 SDOT (indexed) — 64-bit
11 1 UDOT (indexed) — 64-bit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 1 opc 0 0 0 1 1 U Zn Zda
Decode fields
Instruction Details Feature
size U
0x UNALLOCATED -
10 0 USDOT (indexed) FEAT_I8MM
10 1 SUDOT FEAT_I8MM
11 UNALLOCATED -
SVE Misc
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01000101 0 10 op0
Decode fields
Instruction details
op0
00xx UNALLOCATED
010x UNALLOCATED
0110 SVE integer matrix multiply accumulate
0111 UNALLOCATED
1xxx UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 1 uns 0 Zm 1 0 0 1 1 0 Zn Zd
Decode fields
Instruction Details Feature
uns
00 SMMLA FEAT_I8MM
01 UNALLOCATED -
10 USMMLA FEAT_I8MM
Page 2629
Top-level encodings for A64
Decode fields
Instruction Details Feature
uns
11 UMMLA FEAT_I8MM
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 opc 0 0 1 0 opc2 1 0 1 Pg Zn Zd
Decode fields
Instruction Details Feature
opc opc2
0x UNALLOCATED -
10 0x UNALLOCATED -
10 10 BFCVTNT FEAT_BF16
10 11 UNALLOCATED -
11 UNALLOCATED -
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 1 opc 0 0 0 0 0 op Zn Zda
Decode fields
Instruction Details
size op
0x 0 FMLA (indexed) — half-precision
0x 1 FMLS (indexed) — half-precision
10 0 FMLA (indexed) — single-precision
10 1 FMLS (indexed) — single-precision
11 0 FMLA (indexed) — double-precision
11 1 FMLS (indexed) — double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 1 opc 0 0 0 1 rot Zn Zda
Decode fields
Instruction Details
size
0x UNALLOCATED
10 FCMLA (indexed) — half-precision
11 FCMLA (indexed) — single-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 1 opc 0 0 1 0 0 0 Zn Zd
Page 2630
Top-level encodings for A64
Decode fields
Instruction Details
size
0x FMUL (indexed) — half-precision
10 FMUL (indexed) — single-precision
11 FMUL (indexed) — double-precision
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01100100 op0 1 01 op1 0 op2
Decode fields
Instruction details
op0 op1 op2
0 0 00 SVE BFloat16 floating-point dot product (indexed)
0 0 != 00 UNALLOCATED
0 1 UNALLOCATED
1 SVE floating-point multiply-add long (indexed)
These instructions are under SVE Floating Point Widening Multiply-Add - Indexed.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 op 1 i2 Zm 0 1 0 0 0 0 Zn Zda
Decode fields
Instruction Details Feature
op
0 UNALLOCATED -
1 BFDOT (indexed) FEAT_BF16
These instructions are under SVE Floating Point Widening Multiply-Add - Indexed.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 o2 1 i3h Zm 0 1 op 0 i3l T Zn Zda
Decode fields
Instruction Details Feature
o2 op T
0 UNALLOCATED -
1 0 0 BFMLALB (indexed) FEAT_BF16
1 0 1 BFMLALT (indexed) FEAT_BF16
1 1 UNALLOCATED -
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01100100 op0 1 10 op1 00 op2
Decode fields
Instruction details
op0 op1 op2
Page 2631
Top-level encodings for A64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 op 1 Zm 1 0 0 0 0 0 Zn Zda
Decode fields
Instruction Details Feature
op
0 UNALLOCATED -
1 BFDOT (vectors) FEAT_BF16
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 o2 1 Zm 1 0 op 0 0 T Zn Zda
Decode fields
Instruction Details Feature
o2 op T
0 UNALLOCATED -
1 0 0 BFMLALB (vectors) FEAT_BF16
1 0 1 BFMLALT (vectors) FEAT_BF16
1 1 UNALLOCATED -
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 opc 1 Zm 1 1 1 0 0 1 Zn Zda
Decode fields
Instruction Details Feature
opc
00 UNALLOCATED -
01 BFMMLA FEAT_BF16
10 FMMLA — 32-bit element FEAT_F32MM
11 FMMLA — 64-bit element FEAT_F64MM
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm op 1 o2 Pg Zn o3 Pd
Decode fields
Instruction Details
op o2 o3
0 0 0 FCM<cc> (vectors) — FCMGE
Page 2632
Top-level encodings for A64
Decode fields
Instruction Details
op o2 o3
0 0 1 FCM<cc> (vectors) — FCMGT
0 1 0 FCM<cc> (vectors) — FCMEQ
0 1 1 FCM<cc> (vectors) — FCMNE
1 0 0 FCM<cc> (vectors) — FCMUO
1 0 1 FAC<cc> — FACGE
1 1 0 UNALLOCATED
1 1 1 FAC<cc> — FACGT
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 opc Zn Zd
Decode fields
Instruction Details
opc
000 FADD (vectors, unpredicated)
001 FSUB (vectors, unpredicated)
010 FMUL (vectors, unpredicated)
011 FTSMUL
10x UNALLOCATED
110 FRECPS
111 FRSQRTS
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01100101 0 op0 100 op1 op2
Decode fields
Instruction details
op0 op1 op2
0x SVE floating-point arithmetic (predicated)
10 000 FTMAD
10 != 000 UNALLOCATED
11 0000 SVE floating-point arithmetic with immediate (predicated)
11 != 0000 UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 opc 1 0 0 Pg Zm Zdn
Decode fields
Instruction Details
opc
0000 FADD (vectors, predicated)
0001 FSUB (vectors, predicated)
0010 FMUL (vectors, predicated)
Page 2633
Top-level encodings for A64
Decode fields
Instruction Details
opc
0011 FSUBR (vectors)
0100 FMAXNM (vectors)
0101 FMINNM (vectors)
0110 FMAX (vectors)
0111 FMIN (vectors)
1000 FABD
1001 FSCALE
1010 FMULX
1011 UNALLOCATED
1100 FDIVR
1101 FDIV
111x UNALLOCATED
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 opc 1 0 0 Pg 0 0 0 0 i1 Zdn
Decode fields
Instruction Details
opc
000 FADD (immediate)
001 FSUB (immediate)
010 FMUL (immediate)
011 FSUBR (immediate)
100 FMAXNM (immediate)
101 FMINNM (immediate)
110 FMAX (immediate)
111 FMIN (immediate)
31