- Proposal: SE-0107
- Author: Andrew Trick
- Status: Active Review June 28 ... July 10
- Review manager: Chris Lattner
For quick reference, jump to:
Contents:
- Introduction
- Proposed Solution
- Motivation
- Memory model explanation
- Expected use cases
- Detailed design
- Impact on existing code
- Implementation status
- Future improvements and planned additive API
- Variations under consideration
- Alternatives previously considered
Swift enforces type safe access to memory and follows strict aliasing
rules. However, code that uses unsafe APIs or imported types can
circumvent the language's natural type safety. Consider the following
example of type punning using the UnsafePointer
type:
let ptrT: UnsafeMutablePointer<T> = ...
// Store T at this address.
ptrT[0] = T()
// Load U at this address
let u = UnsafePointer<U>(ptrT)[0]
This code violates assumptions made by the compiler and falls into the category of "undefined behavior". Undefined behavior is a way of saying that we cannot easily specify constraints on the behavior of programs that violate a rule. The program may crash, corrupt memory, or be miscompiled in other ways. Miscompilation may include optimizing away code that was expected to execute or executing code that was not expected to execute.
Swift already protects against undefined behavior as long as the code
does not use "unsafe" constructs. However, UnsafePointer
is an
important API for interoperability and building high performance data
structures. As such, the rules for safe, well-defined usage of the API
should be clear. Currently, it is too easy to use UnsafePointer
improperly. For example, innocuous argument conversion such as this
could lead to undefined behavior:
func takesUIntPtr(_ p: UnsafeMutablePointer<UInt>) -> UInt {
return p[0]
}
func takesIntPtr(q: UnsafeMutablePointer<Int>) -> UInt {
return takesUIntPtr(UnsafeMutablePointer(q))
}
Furthermore, no API currently exists for accessing raw, untyped
memory. UnsafePointer<Pointee>
and UnsafeMutablePointer<Pointee>
refer to a typed region of memory, and the compiler assumes that the
element type (Pointee
) is consistent with other access to the same
memory. For details of the compiler's rules for memory aliasing,
see proposed Type Safe Memory Access documentation. Making
UnsafePointer
safer requires introducing a new pointer type that is
not subject to the same strict aliasing rules.
This proposal aims to achieve several goals in one coherent design:
-
Specify a memory model that encompasses all UnsafePointer access and defines which memory operations are subject to strict aliasing rules.
-
Inhibit
UnsafePointer
conversion that violates strict aliasing, in order to make violations of the model clear and verifiable. -
Provide an untyped pointer type.
-
Provide an API for raw, untyped memory access (
memcpy
semantics). -
Provide an API for manual memory layout (bytewise pointer arithmetic).
Swift-evolution threads:
-
[RFC] UnsafeRawPointer API (Round 3)
Mentions of UnsafePointer
that appear in this document's prose also
apply to UnsafeMutablePointer
.
We first introduce each aspect of the proposed API so that the Motivation section can show examples. The Detailed design section lists the complete API.
New raw pointer types, UnsafeRawPointer
and
UnsafeMutableRawPointer
, will represent a "raw", untyped view of
memory. Typed pointers, UnsafePointer<T>
and UnsafeMutablePointer<T>
,
continue to represent a typed view of memory. Memory access through a
raw pointer accesses raw memory, and memory access through a typed
pointer accesses typed memory. Because a raw pointer may alias with
any type, the semantics of reading and writing through a raw pointer
are similar to C memcpy
. Memory access through a typed pointer
adheres to strict aliasing.
UnsafeMutableRawPointer
will provide allocate
and deallocate
methods:
UnsafeMutableRawPointer {
static func allocate(bytes size: Int, alignedTo: Int = /*maximal*/)
func deallocate(bytes: Int, alignedTo: Int = /*maximal*/)
}
Initializing memory via an UnsafeMutableRawPointer
produces an
UnsafeMutablePointer<Pointee>
, and deinitializing the
UnsafeMutablePointer<Pointee>
returns an UnsafeMutableRawPointer
.
UnsafeMutableRawPointer {
// Returns an UnsafeMutablePointer into the newly initialized memory.
func initializeMemory<T>(as: T.Type, count: Int = 1, to: T)
-> UnsafeMutablePointer<T>
}
UnsafeMutablePointer<Pointee> {
/// Returns a raw pointer to the uninitialized memory.
public func deinitialize(count: Int = 1) -> UnsafeMutableRawPointer
}
The type parameter T
passed to initializeMemory
is an explicit
argument because the user must reason about the type's size and
alignment at the point of initialization. Inferring the type from the
value argument could result in miscompilation if the inferred type
ever deviates from the user's original expectations. The type
parameter also importantly conveys that the raw memory becomes
accessible via a pointer to that type at the point of the call. The
type should be explicitly spelled at this point because accesing the
memory via a typed pointer of an unrelated type could also result in
miscompilation.
With the above API for allocation and initialization, the only way to acquire a typed pointer is by using a raw pointer to initialize memory. Raw pointer initialization implicitly binds the memory to the initialized type. A memory location's bound type is an abstract, dynamic property of the memory used to formalize type safety.
Whenever memory is accessed via a typed pointer, the memory must be bound to a related type. This includes operations on Unsafe[Mutable]Pointer in addition to regular language constructs, which are always strictly typed. It does not include memory accessed via a raw pointer, which is not strictly typed. Violations result in undefined behavior.
The user may defer initialization and explicitly bind memory to a type
using the bindMemory
API:
Unsafe[Mutable]RawPointer {
/// Returns an `Unsafe[Mutable]Pointer<T>` pointing to this memory.
func bindMemory<T>(to: T.Type, capacity: Int) -> Unsafe[Mutable]Pointer<T>
}
Calling bindMemory
on a newly allocated raw pointer produces a typed
pointer to uninitialized memory. The bound memory can then be safely
initialized using a typed pointer:
let ptrToA = rawPtr.bindMemory(to: A.self, capacity: 1)
ptrToA.initialize(to: A())
Note that typed pointer initialization does not bind the type. The memory must already be bound to the correct type as a precondition.
Allocating and binding memory to a type may be performed in one step
by using UnsafeMutablePointer.allocate()
:
UnsafeMutablePointer<Pointee> {
static func allocate(capacity count: Int) -> UnsafeMutablePointer<Pointee>
}
Loading from and storing to memory via an Unsafe[Mutable]RawPointer
is safe independent of the memory's bound type as long as layout
guarantees are met (per the ABI), and care is taken to properly
initialize and deinitialize nontrivial values (see
Trivial types). This allows raw memory to be
reinterpreted without rebinding the memory type. Rebinding memory
invalidates existing typed pointers, but loading from and storing to
raw memory does not.
UnsafeMutableRawPointer {
/// Read raw bytes from memory at `self + offset` and construct a
/// value of type `T`.
///
/// - Precondition: The underlying pointer plus `offset` is properly
/// aligned for accessing `T`.
///
/// - Precondition: The memory is initialized to a value of some type, `U`,
/// such that `T` is layout compatible with `U`.
func load<T>(fromByteOffset: Int = 0, as: T.Type) -> T
/// Store a value's bytes into raw memory at `self + offset`.
///
/// - Precondition: The underlying pointer plus `offset` is properly
/// aligned for storing type `T`.
///
/// - Precondition: `T` is a trivial type.
///
/// - Precondition: The memory is uninitialized, or initialized to some
/// trivial type `U` such that `T` and `U` are mutually layout
/// compatible.
///
/// - Postcondition: The memory is initialized to raw bytes. If the
/// memory is bound to type `U`, then it now contains a value of
/// type `U`.
func storeBytes<T>(of: T, toByteOffset: Int = 0, as: T.Type)
}
The load
and storeBytes
operations are assymetric. load
reads raw
bytes but properly constructs a new value of type T
with its own
lifetime. Any copied references will be retained. In contrast,
storeBytes
only operates on a value's raw bytes, writing them into
untyped memory. The in-memory copy will not be constructed and any
previously initialized value in this memory will not be deinitialized
(it cannot be because its type is unknown). Consequently, storeBytes
should only be performed on trivial types.
Assigning memory to a nontrivial type via a raw pointer is done by binding the type:
rawPtr.bindMemory(to: PreviousType.self, capacity: 1).deinitialize(count: 1)
rawPtr.initializeMemory(as: NewType.self, to: NewType())
Providing an API for accessing raw memory would not serve much purpose
without the ability to compute byte offsets. Naturally,
UnsafeRaw[Mutable]Pointer
is Strideable as a sequence of bytes.
UnsafeRawPointer : Strideable {
public func distance(to : UnsafeRawPointer) -> Int
public func advanced(by : Int) -> UnsafeRawPointer
}
public func == (lhs: UnsafeRawPointer, rhs: UnsafeRawPointer) -> Bool
public func < (lhs: UnsafeRawPointer, rhs: UnsafeRawPointer) -> Bool
public func + (lhs: UnsafeRawPointer, rhs: Int) -> UnsafeRawPointer
public func - (lhs: UnsafeRawPointer, rhs: UnsafeRawPointer) -> Int
Currently, an UnsafePointer
initializer supports conversion between
potentially incompatible pointer types:
struct Unsafe[Mutable]Pointer<Pointee> {
public init<U>(_ from : Unsafe[Mutable]Pointer<U>)
}
This initializer will be removed. UnsafePointer
conversion is still
possible, but is now explicit and provably correct based on the
conversion's preconditions and postconditions.
Recall that bindMemory(to:capacity:)
produces a typed pointer from a
raw pointer. As explained above, it can be used to bind uninitialized
memory for deferred initialization. When invoked on memory that is
already bound, and potentially already initialized, it effectively
rebinds the memory. Because memory can only be bound to one type at a
time, all strictly typed memory operations that subsequently access
this memory must be consistent with the newly bound type.
A convenience API makes it easy to handle type mismatches that arise
from interoperability without compromising on safety. In this case,
the user already has a typed pointer but needs to temporarily rebind
the memory for the purpose of invoking code that expects a different
type. withMemoryRebound<T>(to:capacity:)
rebinds memory to the
specified type, executes a closure with a pointer to the rebound
memory, then rebinds memory to the original type before returning:
Unsafe[Mutable]Pointer<Pointee> {
func withMemoryRebound<T>(to: T.Type, capacity count: Int,
_ body: @noescape (Unsafe[Mutable]Pointer<T>) throws -> ()) rethrows
}
This is safe provided that the body
closure does not capture self
.
It is possible to directly acquire a typed pointer from a raw pointer without rebinding the type, bypassing static safety. This does not weaken the rules for typed memory access because it relies on the precondition is that memory is already bound to the returned pointer's type. This is useful when the memory's bound type is known but the pointer's type has been erased:
Unsafe[Mutable]RawPointer {
func assumingMemoryBound<T>(to: T.Type) -> Unsafe[Mutable]Pointer<T>
}
For a more detailed disucssion, see the Memory model explanation.
The following examples show the differences between memory access as
it currently would be done using UnsafeMutablePointer
vs. the
proposed UnsafeMutableRawPointer
.
Consider two layout compatible, but unrelated structs, A
and B
,
and helpers that read from these structs via unsafe pointers:
// --- common definitions used by old and new code ---
struct A {
var value: Int
}
struct B {
var value: Int
}
func printA(_ pA: UnsafePointer<A>) {
print(pA[0])
}
func printB(_ pB: UnsafePointer<B>) {
print(pB[0])
}
Normal allocation, initialization, access, and deinitialization of a
struct looks like this with UnsafePointer
:
// --- old version ---
func initA(pA: UnsafeMutablePointer<A>) {
pA.initialize(to: A(value:42))
}
func initB(pB: UnsafeMutablePointer<B>) {
pB.initialize(to: B(value:13))
}
func normalLifetime() {
let pA = UnsafeMutablePointer<A>.allocate(capacity: 1)
initA(pA)
printA(pA)
pA.deinitialize(count: 1)
pA.deallocateCapacity(1)
}
This code continues to work. However, with UnsafeMutableRawPointer
,
it is possible to distinguish between raw allocated memory, and memory
that has been initialized as some type. First, we define new
initialization helpers that take raw pointers and return typed
pointers:
// --- new version ---
func initRawA(p: UnsafeMutableRawPointer) -> UnsafeMutablePointer<A> {
return p.initializeMemory(as: A.self, to: A(value:42))
}
func initRawB(p: UnsafeMutableRawPointer) -> UnsafeMutablePointer<B> {
return p.initializeMemory(as: B.self, to: B(value:13))
}
Now we can allocate raw memory and obtain a typed pointer through initialization:
// --- new version ---
func normalLifetime() {
let rawPtr = UnsafeMutableRawPointer.allocate(bytes: sizeof(A.self))
// rawPtr cannot be assigned to a value of `A`, forcing initialization before
// typed access.
let pA = initRawA(rawPtr)
printA(pA)
let uninitPtr = pA.deinitialize(count: 1)
uninitPtr.deallocate(bytes: sizeof(A.self))
}
Consider another example, now ignoring memory deallocation for
brevity. Technically, it is correct to initialize values of type A
and B
in different memory locations, but confusing and dangerous
with the current UnsafeMutablePointer
API:
// --- old version ---
// Return a pointer to (A, B).
func initAB() -> UnsafeMutablePointer<A> {
// Memory is uninitialized, but pA is already typed.
let pA = UnsafeMutablePointer<A>.allocate(capacity: 2)
// Part of the memory is initialized to `A`. Redundant conversion of
// UnsafeMutablePointer values like this is not uncommon, and it is
// nonobvious at the point of conversion that it does not actually
// change the pointer type.
initA(UnsafeMutablePointer(pA))
// pA is recast as pB with no indication that the pointee type has changed.
initB(UnsafeMutablePointer(pA + 1))
return pA
}
Code in the caller is confusing:
// --- old version ---
func testInitAB() {
let pA = initAB()
printA(pA)
// pA is again recast as pB with no indication that the pointee type changes.
printB(UnsafeMutablePointer(pA + 1))
// Or recast to pB first, which is also misleading because the pointer still
// points to an initialized `A` value at the point of the cast.
printB(UnsafeMutablePointer<B>(pA) + 1)
}
With UnsafeMutableRawPointer
there is no need to cast to an invalid
pointer type in order to access manually allocated memory:
// --- new version ---
// Return a pointer to an untyped memory region initialized with (A, B).
func initRawAB() -> UnsafeMutableRawPointer {
let rawPtr = UnsafeMutableRawPointer.allocate(bytes: 2 * strideof(Int.self))
// Initialize the first Int with `A`, producing UnsafeMutablePointer<A>.
let pA = initA(rawPtr)
// Initialize the second Int with `B`.
// This implicitly casts UnsafeMutablePointer<A> to UnsafeMutableRawPointer,
// which is equivalent to initB(rawPtr + strideof(Int)).
// Unlike the old API, no unsafe pointer conversion is needed.
initB(pA + 1)
return rawPtr
}
Now, in the caller, unsafe conversion from raw memory to typed memory is always explicit:
// --- new version ---
// Code in the caller is explicit:
func testInitAB() {
// Get a raw pointer to (A, B).
let p = initRawAB()
// The untyped memory is explicitly converted to a pointer-to-A.
// Safe because we know the underlying memory is bound to `A` via
// raw pointer initialization.
let pA = p.assumingMemoryBound(to: A.self)
printA(pA)
// Converting from a pointer-to-A into a pointer-to-B without
// rebinding the type requires casting to an UnsafeRawPointer.
printB(UnsafeRawPointer(pA + 1).assumingMemoryBound(to: B.self))
// Or directly convert the original UnsafeRawPointer into pointer-to-B.
printB((p + strideof(Int.self)).assumingMemoryBound(to: B.self))
}
This is much more explicit and verifiable, but still not statically
type safe. However, now that it is possible to bind memory to a type,
this example may be rewritten so that strict aliasing rules are
statically enforced. Now the caller no longer needs to assume the
pointer type that was used to initialize memory in initAB
. They only
need to agree on the layout of the memory:
// --- new and improved version ---
// Return a pointer to an untyped memory region initialized with (A, B).
func initRawAB() -> UnsafeMutableRawPointer {
let intPtr = UnsafeMutablePointer<Int>.allocate(capacity: 2)
intptr[0] = 42 // knowing A is layout compatible with Int
intptr[1] = 13 // knowing B is layout compatible with Int
return intptr
}
func testInitAB() {
// Get a raw pointer to (A, B).
let p = initRawAB()
let pA = p.bindMemory(to: A.self, capacity: 1)
printA(pA)
// Knowing the `B` has the same alignment as `A`.
let pB = UnsafeRawPointer(pA + 1).bindMemory(to: B.self, capacity: 1)
printB(pB)
}
Initializing or assigning values of different types to the same
location using a typed pointer is undefined. Here, the compiler can
choose to ignore the order of assignment, and initAthenB
may print
13 twice or 42 twice.
// --- old version ---
func initAthenB(_ p: UnsafeMutablePointer<Void>) {
let p = UnsafeMutablePointer<Int>.allocate(capacity: 1)
initA(UnsafeMutablePointer(p)) // *p = 42
printA(UnsafeMutablePointer(p))
initB(UnsafeMutablePointer(p)) // *p = 13
printB(UnsafeMutablePointer(p))
}
With the proposed API, assigning values of different types to the same
location can now be safely done by properly initializing and
deinitializing the memory through UnsafeMutableRawPointer
. Ultimately, the
values may still be accessed via the same convenient
UnsafeMutablePointer
type. Type punning has not happened, because the
UnsafeMutablePointer
has the same type as the memory's bound
type whenever it is dereferenced.
// --- new version ---
func initAthenB {
let rawPtr = UnsafeMutableRawPointer.allocate(bytes: strideof(Int.self))
let pA = initRawA(rawPtr) // raw pointer initialization binds memory to `A`
printA(pA)
// After deinitializing pA, uninitPtr receives a pointer to
// untyped raw memory, which may be reused for `B`.
let uninitPtr = pA.deinitialize(count: 1)
// rawPtr and uninitPtr have the same value, thus are substitutable.
assert(rawPtr == uninitPtr)
// initB rebinds the memory to `B` before writing to memory, so
// reinitialization cannot be reordered with previous accesses to pA.
initRawB(uninitPtr)
printB(pB)
}
No API currently exists that allows initialized memory to hold either `A` or `B`.
// --- old version ---
// This conditional initialization looks valid, but is dangerous.
func initAorB(_ p: UnsafeMutablePointer<Void>, isA: Bool) {
if isA {
initA(UnsafeMutablePointer(p))
}
else {
initB(UnsafeMutablePointer(p))
}
}
Code in the caller could produce undefined behavior:
// --- old version ---
func testInitAorB() {
let p = UnsafeMutablePointer<Int>.allocate(capacity: 1)
// If the compiler inlines, then the initialization and use of the
// values of type `A` and `B`, which share memory, could be incorrectly
// interleaved.
initAorB(p, isA: true)
printA(UnsafeMutablePointer(p))
initAorB(p, isA: false)
printB(UnsafeMutablePointer(p))
}
UnsafeMutableRawPointer
allows initialized memory to hold either A
or B
. The same UnsafeMutableRawPointer
value can be reused across
multiple initializations and deinitializations. Unlike the old API,
this is safe because the memory initialization on a raw pointer writes
to untyped memory and binds the memory type. Binding memory to a type
separates access to the distinct types from the compiler's viewpoint.
// --- new version ---
func initRawAorB(_ p: UnsafeMutableRawPointer, isA: Bool) {
// Unsafe pointer conversion is no longer required to initialize memory.
if isA {
initRawA(p)
}
else {
initRawB(p)
}
}
Code in the caller is well defined because initAorB
is now a
compiler barrier for unsafe pointer access. Furthermore, each unsafe
pointer cast is explicit:
// --- new version ---
func testInitAorB() {
let p = UnsafeMutableRawPointer.allocate(bytes: sizeof(Int.self))
initRawAorB(p, isA: true)
printA(p.assumingMemoryBound(to: A.self))
initRawAorB(p, isA: false)
printB(p.assumingMemoryBound(to: B.self))
}
UnsafeMutableRawPointer
provides a legal way to reinterpret memory
in-place, which was previously unsupported. The following example is
safe because the load of B
reads from untyped memory via a raw
pointer.
// --- new version ---
func testReinterpret() {
let p = UnsafeMutableRawPointer.allocate(bytes: sizeof(Int.self))
// Initialize raw memory to `A`.
initRawAorB(p, isA: true)
// Load from raw memory as `B` (reinterpreting the value in memory).
print(p.load(as: B.self))
}
This is not "type-punning" because a typed pointer is never
accessed. Note that printB(p.assumingMemoryBound(to: B.self))
would
be illegal, because the a typed pointer to B
cannot be used to
access an unrelated type A
.
Developers may be forced to work with "loosely typed" APIs, particularly for interoperability:
func readBytes(_ bytes: UnsafePointer<UInt8>) {
// 3rd party implementation...
}
func readCStr(_ string: UnsafePointer<CChar>) {
// 3rd party implementation...
}
Working with these third party API's exclusively using
UnsafeMutablePointer
would lead to undefined behavior, as shown here
using the current API:
// --- old version ---
func stringFromBytes(size: Int, value: UInt8) {
let bytes = UnsafeMutablePointer<UInt8>.allocate(capacity: size + 1)
bytes.initialize(to: value, count: size)
bytes[size] = 0
// Unsafe pointer conversion is requred to invoke readCString.
// If readCString is inlineable and compiled with strict aliasing,
// then it could read uninitialized memory.
readCStr(UnsafePointer(bytes))
// The signature of readBytes is consistent with the `bytes` argument type.
readBytes(bytes)
}
Reading from uninitialized memory is now prevented by explictly rebinding the type.
// --- new version ---
func stringFromBytes(size: Int, value: UInt8) {
let buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: size + 1)
buffer.initialize(to: value, count: size)
bytes[size] = 0
buffer.withMemoryRebound(to: CChar.self, capacity: size + 1) {
readCStr($0)
}
readBytes(buffer)
}
Rather than temporarily rebinding memory, the user may want to rebind
memory to CChar
once and keep the same typed pointer around for
future use without keeping track of the memory capacity. In that case,
the program could continue to write UInt8 values to memory without
casting to CChar and without rebinding memory as long as those writes
use the UnsafeMutableRawPointer.storeBytes
API for raw memory access:
// --- new version ---
func mutateBuffer(size: Int, value: UInt8) {
let rawBuffer = UnsafeMutableRawPointer.allocate(bytes: size + 1)
rawBuffer.initializeMemory(as: UInt8.self, count: size, to: value)
rawBuffer.initializeMemory(as: UInt8.self, atIndex: size, to: 0)
let cstr = rawBuffer.bindMemory(to: CChar.self, capacity: size + 1)
// Access memory as CChar.
readCStr(cstr)
// Write UInt8 values to memory without needing explicitly cast each
// value to CChar first.
for i in 0..<size {
rawBuffer.storeBytes(of: getByte(), toByteOffset: i, as: UInt8.self)
}
// Access memory again as CChar.
readCStr(cstr)
}
func getByte() -> UInt8 {
// 3rd party implementation...
}
The side effects of illegal type punning may result in storing values in the wrong sequence, reading uninitialized memory, or memory corruption. It could even result in execution following code paths that aren't expected as shown here:
// --- old version ---
func testUndefinedExecution() {
let pA = UnsafeMutablePointer<A>.allocate(capacity: 1)
pA[0] = A(value:42)
if pA[0].value != 42 {
// Code path should never execute...
releaseDemons()
}
// This compiler may inline this, and hoist the store above the
// previous check.
unforeseenCode(pA)
}
func releaseDemons() {
// Something that should never be executed...
}
func assignB(_ pB: UnsafeMutablePointer<B>) {
pB[0] = B(value:13)
}
func unforeseenCode(_ pA: UnsafeMutablePointer<A>) {
// At some arbitrary point in the future, the same memory is
// innocuously assigned to B.
assignB(UnsafeMutablePointer(pA))
}
Prohibiting conversion between incompatible UnsafePointer
types,
providing an API for binding memory to a type, and supporting raw
memory access are necessary to avoid the dangers of type punning and
encourage safe idioms for working with pointers.
The fundamental difference between Unsafe[Mutable]RawPointer
and
Unsafe[Mutable]Pointer<Pointee>
is simply that the former is used
for "untyped" memory access, and the later is used for "typed" memory
access. Let's refer to these as "raw pointers" and "typed
pointers". Because operations on raw pointers access untyped memory, the
compiler cannot make assumptions about the underlying type of memory
and must be conservative. With operations on typed pointers, the
compiler may make strict assumptions about the type of the underlying
memory, which allows more aggressive optimization.
All allocated memory is either "uninitialized" or "initialized". Upon initialization, memory contains a typed value. Initialized memory may be assigned to a new value of the same type. Upon deinitialization, the memory no longer holds a value.
Consider the sequence of abstract memory operations:
Abstract Operation | Memory State |
---|---|
rawptr = allocate() |
uninitialized |
tptr = rawptr.initializeMemory(T) |
initialized |
tptr.pointee = T |
initialized |
tptr.deinitialize() |
uninitialized |
Initializing memory via a raw pointer binds the memory type. Initialized memory must always be bound to a type. Deinitialization does not unbind the type. Memory remains bound to a type until it is rebound to a different type.
Abstract Operation | Memory State | Type |
---|---|---|
rawptr = allocate() |
uninitialized | None |
tptr = rawptr.initializeMemory(T) |
initialized | bound to T |
tptr.deinitialize() |
uninitialized | bound to T |
uptr = rawptr.initializeMemory(U) |
initialized | bound to U |
uptr.deinitialize() |
uninitialized | bound to U |
rawptr.deallocate() |
invalid | None |
Rebinding memory effectively changes the type of any initialized values within the rebound memory region. Accessing the memory via a typed pointer of unrelated type is undefined:
Abstract Operation | Memory State | Type |
---|---|---|
tptr = rawptr.initializeMemory(T) |
initialized | bound to T |
tptr.deinitialize() |
uninitialized | bound to T |
uptr = rawptr.initializeMemory(U) |
initialized | bound to U |
uptr.deinitialize() |
uninitialized | bound to U |
tptr.initialize() |
undefined | undefined |
By this convention, raw pointers primarily refer to uninitialized memory and typed pointers primarily refer to initialized memory. This is not a requirement, and important use cases follow different conventions. After a raw pointer is intialized, the raw pointer value remains valid and can continue to be used to access the underlying memory in an untyped way. Conversely, a raw pointer can bound to a typed pointer without initializing the underlying memory.
A raw pointer's memory may be explicitly bound to a type, bypassing raw initialization:
let ptrA = rawPtr.bindMemory(to: A.self, capacity: 1)
The resulting typed pointer may then be used to initialize memory:
ptrA.initialize(to: A())
Abstract Operation | Memory State | Type |
---|---|---|
rawptr = allocate() |
uninitialized | None |
tptr = rawptr.bindMemory(T) |
uninitialized | bound to T |
tptr.initialize() |
initialized | bound to T |
The memory remains bound to this type until it is rebound through raw
pointer initialization or another call to bindMemory(to:)
.
Abstract Operation | Memory State | Type |
---|---|---|
rawptr = allocate() |
uninitialized | None |
tptr = rawptr.bindMemory(T) |
uninitialized | bound to T |
tptr.initialize() |
initialized | bound to T |
tptr.deinitialize() |
uninitialized | bound to T |
uptr = rawptr.bindMemory(U) |
uninitialized | bound to U |
uptr.initialize() |
initialized | bound to U |
Allocation and binding can be combined as typed allocation:
Abstract Operation | Memory State | Type |
---|---|---|
tptr = allocate(T) |
uninitialized | bound to T |
tptr.initialize() |
initialized | bound to T |
Initializing memory via a typed pointer requires the memory to be already be bound to that type. This is often more convenient than working with raw pointers, and can improve performance in some cases. In particular, it is an effective technique for implementing data structures that manage storage for contiguous elements. The data structure may allocate a buffer with extra capacity and track the initialized state of each element position as such:
func getAt(index: Int) -> A {
if !isInitializedAt(index) {
(ptrA + index).initialize(to: Type())
}
return ptrA[index]
}
For example, see the C buffer use case below.
When using a typed pointer to initialize memory, the programmer must ensure that memory has been bound to that type and takes responsiblity for tracking the initialized state of memory.
Accessing memory via a pointer type that is unrelated to the memory's bound type violates strict aliasing, and is thus undefined. For the purpose of this proposal, we simply specify when strict aliasing applies and that aliasing types must be related. For an explanation of related types and layout compatibility, see proposed Type Safe Memory Access documentation.
Regardless of whether strict aliasing applies, accessing initialized in-memory values always requires the access type to be layout compatible with the value's type. This applies to access via the raw pointer API in addition to typed pointer access. Similarly, rebinding initialized in-memory values to another type requires both the previous and new type to be mutually layout compatible.
Accessing memory via a typed pointer (or normal, safe language construct) has an additional requirement that the pointer type must be related to the memory's bound type. For this reason, typed pointers are only obtained by intializing raw memory or explicitly binding the memory type. In practice, with the proposed API, the only way to violate strict aliasing is to reuse a typed pointer value after the underlying memory has been rebound to an unrelated type:
Abstract Operation | Memory State | Type |
---|---|---|
tptr = rawptr.bindMemory(T) |
uninitialized | bound to T |
uptr = rawptr.bindMemory(U) |
uninitialized | bound to U |
tptr.initialize() |
undefined | T is unrelated to U |
A program may read from and write to memory via a raw pointer even after the memory has been initialized:
let rawPtr = UnsafeMutableRawPointer.allocate(bytes: sizeof(SomeType.self))
let ptrToSomeType = rawPtr.initializeMemory(as: SomeType.self, to: SomeType())
// read raw initialized memory
let reinterpretedValue = rawPtr.load(as: AnotherType.self)
// overwrite raw initialized memory
rawPtr.storeBytes(of: AnotherType(), as: AnotherType.self)
SomeType
and AnotherType
need not to be related types. They must
only be layout compatible. In other words, the programmer must ensure
compatibilty of the size, alignment, and position of references. This
requires some knowledge of the ABI.
Loading from raw memory reinterprets the in-memory bytes, and constructs a new local value. If that value contains class references, the class type of those reference must be related to the instance's dynamic type. This is a incontravertible property of all reference values in the system.
Storing a value into raw memory does not support reference types. Additionally, it requires consideration of the type of value being overwritten because a raw store overwrites memory contents without destroying the previous value. Storing to raw memory is safe if either the memory is uninitialized or initialized to a trivial type. The value being stored must also be trivial so that it can be assigned via a bit-for-bit copy.
A "trivial type" promises that assignment just requires a fixed-size bit-for-bit copy without any indirection or reference-counting operations. Generally, native Swift types that do not contain strong or weak references or other forms of indirection are trivial, as are imported C structs and enums.
Examples of trivial types:
- Integer and floating-point types
Bool
Optional<T>
whereT
is trivialUnmanaged<T: AnyObject>
- struct types where all members are trivial
- enum types where all payloads are trivial
This section lists several typical use cases involving
UnsafeRawPointer
and UnsafePointer
.
For explanatory purposes consider the following global definitions:
struct A {
var value: Int32
}
struct B {
var value: Int32
}
var ptrToA: UnsafeMutablePointer<A>
var eltCount: Int = 0
Using a pointer to a single value:
func createValue() {
ptrToA = UnsafeMutablePointer<A>.allocate(capacity: 1)
ptrToA.initialize(to: A(value: 42))
}
func deleteValue() {
ptrToA.deinitialize(count: 1)
ptrToA.deallocate(capacity: 1)
}
Using a fully initialized set of contiguous homogeneous values:
func createCArray(from source: UnsafePointer<A>, count: Int) {
ptrToA = UnsafeMutablePointer<A>.allocate(capacity: count)
ptrToA.initialize(from: source, count: count)
eltCount = count
}
func deleteCArray() {
ptrToA.deinitialize(count: eltCount)
ptrToA.deallocate(capacity: eltCount)
}
Managing a buffer with a mix of initialized and uninitialized contiguous elements. Typically, information about which elements are initialized will be separately maintained to ensure that each method's preconditions are met:
func createCBuffer(size: Int) {
ptrToA = UnsafeMutablePointer<A>.allocate(capacity: size)
eltCount = size
}
// - precondition: memory at `index` is uninitialized.
func initElement(index: Int, with value: A) {
(ptrToA + index).initialize(to: value)
}
// - precondition: memory at `index` is initialized.
func getElement(index: Int) -> A {
return ptrToA[index]
}
// - precondition: memory at `index` is initialized.
func assignElement(index: Int, with value: A) {
ptrToA[index] = value
}
// - precondition: memory at `index` is initialized.
func deinitElement(index: Int) {
(ptrToA + index).deinitialize()
}
// - precondition: memory for all elements is uninitialized.
func freeCBuffer() {
ptrToA.deallocate(capacity: eltCount)
}
// Layout an object with header type `A` followed by `n` elements of type `B`.
func createValueWithTail(count: Int) {
// Assuming the alignment of `A` satisfies the alignment of `B`.
let numBytes = strideof(A) + (count * strideof(B))
let rawPtr = UnsafeMutableRawPointer.allocate(
bytes: numBytes, alignedTo: alignof(A))
// Initialize the object header.
ptrToA = rawPtr.initializeMemory(as: A.self, to: A(value: 42))
// Append `count` elements of type `B` to the object tail.
eltCount = count
UnsafeMutableRawPointer(ptrToA + 1).initializeMemory(
as: B.self, count: count, with: B(value: 13))
}
func getTailElement(index: Int) -> B {
return UnsafeRawPointer(ptrToA + 1)
.assumingMemoryBound(to: B.self)[index]
}
func deleteValueWithTail() {
UnsafeMutableRawPointer(ptrToA + 1)
.assumingMemoryBound(to: B.self).deinitialize(count: eltCount)
let numBytes = strideof(A) + (eltCount * strideof(B))
ptrToA.deinitialize(count: 1).deallocate(
bytes: numBytes, alignedTo: alignof(A))
}
Direct bytewise memory access to a buffer of unknown type:
// Format1:
// flags: UInt16
// state: UInt16
// value: Int32
// Format2:
// value: Int32
func receiveMsg(flags: UInt16, state: UInt16, value: Int32) {}
func readMsg(msgBuf: UnsafeRawPointer, isFormat1: Bool) {
if isFormat1 {
receiveMsg(flags: msgBuf.load(as: UInt16.self),
state: msgBuf.load(fromByteOffset: 2, as: UInt16.self),
value: msgBuf.load(fromByteOffset: 4, as: Int32.self))
}
else {
receiveMsg(flags: 0, state: 0, value: msgBuf.load(as: Int32.self))
}
}
Accessing raw underlying memory bytes, independent of the memory's bound type:
// Direct bytewise element copy.
func copyArrayElement(fromIndex: Int, toIndex: Int) {
let srcPtr = UnsafeRawPointer(ptrToA + fromIndex)
let destPtr = UnsafeMutableRawPointer(ptrToA + toIndex)
destPtr.copyBytes(from: srcPtr, count: sizeof(A.self))
}
// Bytewise element swap.
// Initializes and deinitializes temporaries of type Int.
// Int is layout compatible with `A`.
func swapArrayElements(index i: Int, index j: Int) {
let rawPtrI = UnsafeMutableRawPointer(ptrToA + i)
let rawPtrJ = UnsafeMutableRawPointer(ptrToA + j)
let tmpi = rawPtrI.load(as: Int.self)
let tmpj = rawPtrJ.load(as: Int.self)
rawPtrI.storeBytes(of: tmpj, as: Int.self)
rawPtrJ.storeBytes(of: tmpi, as: Int.self)
}
var freePtr: UnsafeMutableRawPointer? = nil
func allocate32() -> UnsafeMutableRawPointer {
if let newPtr = freePtr {
freePtr = nil
return newPtr
}
return UnsafeMutableRawPointer.allocate(bytes: 4, alignedTo: 4)
}
func deallocate32(_ rawPtr: UnsafeMutableRawPointer) {
if freePtr != nil {
rawPtr.deallocate(bytes: 4, alignedTo: 4)
}
else {
freePtr = rawPtr
}
}
func createA(value: Int32) -> UnsafeMutablePointer<A> {
return allocate32().initializeMemory(as: A.self, to: A(value: value))
}
func createB(value: Int32) -> UnsafeMutablePointer<B> {
return allocate32().initialize(as: B.self, to: B(value: value))
}
func deleteA(ptrToA: UnsafeMutablePointer<A>) {
return deallocate32(ptrToA.deinitialize(count: 1))
}
func deleteB(ptrToB: UnsafeMutablePointer<B>) {
return deallocate32(ptrToB.deinitialize(count: 1))
}
UnsafePointer<T>
to UnsafeRawPointer
conversion will be provided
via an unlabeled initializer.
extension UnsafeRawPointer: _Pointer {
init<T>(_: UnsafePointer<T>)
init<T>(_: UnsafeMutablePointer<T>)
}
extension UnsafeMutableRawPointer: _Pointer {
init<T>(_: UnsafeMutablePointer<T>)
}
Conversion from UnsafeRawPointer
to a typed UnsafePointer<T>
requires invoking UnsafeRawPointer.bindMemory(to:capacity:)
or
UnsafeRawPointer.assumingMemoryBound(to:)
, explicitly spelling the
destination type:
let p = UnsafeRawPointer(...)
let pT = p.bindMemory(to: T.self, capacity: n)
...
let pT2 = p.assumingMemoryBound(to: T.self)
Just as with unsafeBitCast
, although the destination of the cast can
usually be inferred, we want the developer to explicitly state the
intended destination type, both because type inferrence can be
surprising, and because it's important for code comprehension.
Some existing conversions between UnsafePointer
types do not
convert Pointee
types but instead coerce an
UnsafePointer
to an UnsafeMutablePointer
. This is no longer an
inferred conversion, but must be explicitly requested:
extension UnsafeMutablePointer {
init(mutating from: UnsafePointer<Pointee>)
}
Consider two C functions that take const
pointers:
void takesConstTPtr(const T*);
void takesConstVoidPtr(const void*);
Which will be imported with immutable pointer argument types:
func takesConstTPtr(_: UnsafePointer<T>)
func takesConstVoidPtr(_: UnsafeRawPointer)
Mutable pointers can be passed implicitly as immutable pointers.
let unsafeMutablePtr: UnsafeMutablePointer<T>
let mutableRawPtr: UnsafeMutableRawPointer
takesConstTPtr(unsafeMutablePtr)
takesConstVoidPtr(mutableRawPtr)
Any mutable or immutable pointer type can be passed implicitly as an immutable void pointer:
let unsafePtr: UnsafePointer<T>
let unsafeMutablePtr: UnsafeMutablePointer<T>
takesConstVoidPtr(unsafePtr)
takesConstVoidPtr(unsafeMutablePtr)
Implicit inout conversion will continue to work:
var anyT: T
takesConstTPtr(&anyT)
takesConstVoidPtr(&anyT)
Array
/String
conversion will continue to work:
let a = [T()]
takesConstTPtr(a)
takesConstVoidPtr(a)
let s = "string"
takesConstVoidPtr(s)
Consider two C functions that take non-const
pointers:
void takesTPtr(T*);
void takesVoidPtr(void*);
Which will be imported with mutable pointer argument types:
func takesTPtr(_: UnsafeMutablePointer<T>)
func takesVoidPtr(_: UnsafeMutableRawPointer)
Any mutable pointer type can be passed implicitly as a mutable void pointer:
let unsafeMutablePtr: UnsafeMutablePointer<T>
takesVoidPtr(unsafeMutablePtr)
Implicit inout conversion will continue to work:
var anyT = T(...)
takesTPtr(&anyT)
takesVoidPtr(&anyT)
Array
/String
conversion to mutable pointer is still not allowed.
The following API entry points support copying or moving values between unsafe pointers.
Given values of these types:
let unsafePtr: UnsafePointer<T>
let unsafeMutablePtr: UnsafeMutablePointer<T>
let rawPtr: UnsafeRawPointer
let mutableRawPtr: UnsafeMutableRawPointer
let c: Int
mutableRawPtr.copyBytes(from: rawPtr, count: c)
A raw copy from typed to raw memory can also be done by calling
copyBytes
, exactly as shown above. Implicit argument conversion from
UnsafePointer<T>
to UnsafeRawPointer
makes this seamless.
Additionally, raw memory can be bulk initialized from typed memory:
mraw.initializeMemory(as: T.self, from: unsafePtr, count: c)
Because UnsafeMutablePointer<T>
arguments are implicitly converted to UnsafePointer<T>
, the initializeMemory
call above works seamlessly.
Additionally, a mutable typed pointer can be moved-from:
mraw.moveInitializeMemory(as: T.self, from: unsafeMutablePtr, count: c)
No bulk conversion is currently supported from raw to typed memory.
Copying between typed memory is still supported via bulk assignment (the naming style is updated for consistency):
ump.assign(from: up, count: c)
ump.moveAssign(from: up, count: c)
One of the more common unsafe pointer conversions arises from viewing a C
string as either an array of bytes (UInt8
) or C characters
(CChar
). In Swift, this manifests as arguments of type
UnsafePointer<UInt8>
and UnsafePointer<CChar>
. The String API
even encourages interoperability between C APIs and a String's UTF8
encoding. For example:
var utf8 = template.nulTerminatedUTF8
let (fd, fileName) = utf8.withUnsafeMutableBufferPointer {
(utf8) -> (CInt, String) in
let cStrBuf = UnsafeRawPointer(utf8.baseAddress!)
.cast(to: UnsafePointer<CChar>)
let fd = mkstemps(cStrBuf, suffixlen)
let fileName = String(cString: cStrBuf)
...
}
This particular case is theoretically invalid because
nulTerminatedUTF8
writes a buffer of UInt8
and mkstemps
overwrites the same memory as a buffer of CChar
. More commonly, the
pointer conversion is valid because the buffer is only initialized
once. Nonetheless, the explicit casting is extremely awkward for
such a common use case. To avoid excessive UnsafePointer
conversion
and ease migration to the UnsafeRawPointer
model, helpers will be
added to the String
API.
In CString.swift
:
extension String {
init(cString: UnsafePointer<UInt8>)
}
And in StringUTF8.swift
:
extension String {
var nulTerminatedUTF8CString: ContiguousArray<CChar>
}
With these two helpers, conversion between UnsafePointer<CChar>
and
UnsafePointer<UInt8>
is safe without sacrificing efficiency. The
String
initializer already copies the byte array into the String's
internal representation, so can trivially convert the element
type. The nulTerminatedUTF8CString
function also copies the
string's internal representation into an array of UInt8
. With this
helper, no unsafe casting is necessary in the previous example:
var utf8Cstr = template.nulTerminatedUTF8CString
let (fd, fileName) = utf8.withUnsafeMutableBufferPointer {
(utf8CStrBuf) -> (CInt, String) in
let fd = mkstemps(utf8CStrBuf, suffixlen)
let fileName = String(cString: utf8CStrBuf)
...
}
Most of the API was already presented above. For the sake of having it
in one place, a list of the expected UnsafeMutableRawPointer
members
is shown below.
For full doc comments, see the github revision.
struct UnsafeMutableRawPointer : Strideable, Hashable, _Pointer {
var _rawValue: Builtin.RawPointer
var hashValue: Int
init(_ _rawValue : Builtin.RawPointer)
init(_ other : OpaquePointer)
init(_ other : OpaquePointer?)
init?(bitPattern: Int)
init?(bitPattern: UInt)
init<T>(_: UnsafeMutablePointer<T>)
init?<T>(_: UnsafeMutablePointer<T>?)
static func allocate(bytes: Int, alignedTo: Int = /*maximal*/)
-> UnsafeMutableRawPointer
func deallocate(bytes: Int, alignedTo: Int = /*maximal*/)
func bindMemory<T>(to: T.Type, capacity: Int) -> UnsafeMutablePointer<T>
func assumingMemoryBound<T>(to: T.Type) -> UnsafeMutablePointer<T>
func initializeMemory<T>(as: T.Type, at: Int = 0, count: Int = 1, to: T)
-> UnsafeMutablePointer<T>
func initializeMemory<T>(as: T.Type, from: UnsafePointer<T>, count: Int)
-> UnsafeMutablePointer<T>
func initializeMemory<C : Collection>(as: C.Iterator.Element.Type, from: C)
-> UnsafeMutablePointer<C.Iterator.Element>
func moveInitializeMemory<T>(
as: T.Type, from: UnsafeMutablePointer<T>, count: Int
) -> UnsafeMutablePointer<T> {
func load<T>(fromByteOffset: Int = 0, as: T.Type) -> T
func storeBytes<T>(of: T, toByteOffset: Int = 0, as: T.Type)
func copyBytes(from: UnsafeRawPointer, count: Int)
func distance(to: UnsafeRawPointer) -> Int
func advanced(by: Int) -> UnsafeRawPointer
}
The immutable UnsafeRawPointer
members are:
struct UnsafeRawPointer : Strideable, Hashable, _Pointer {
var _rawValue: Builtin.RawPointer
var hashValue: Int
init(_ _rawValue : Builtin.RawPointer)
init(_ other : OpaquePointer)
init(_ other : OpaquePointer?)
init?(bitPattern: Int)
init?(bitPattern: UInt)
init<T>(_: UnsafeMutablePointer<T>)
init?<T>(_: UnsafeMutablePointer<T>?)
func deallocate(bytes: Int, alignedTo: Int = /*maximal*/)
func bindMemory<T>(to: T.Type, capacity: Int) -> UnsafePointer<T>
func assumingMemoryBound<T>(to: T.Type) -> UnsafePointer<T>
func load<T>(fromByteOffset: Int = 0, as: T.Type) -> T
func distance(to: UnsafeRawPointer) -> Int
func advanced(by: Int) -> UnsafeRawPointer
}
The added UnsafeMutablePointer
members are:
UnsafeMutablePointer<Pointee> {
init(mutating from: UnsafePointer<Pointee>)
func withMemoryRebound<T>(to: T.Type, capacity count: Int,
_ body: @noescape (UnsafeMutablePointer<T>) throws -> ()) rethrows
}
The added UnsafePointer
members are:
UnsafePointer<Pointee> {
// Inferred initialization from mutable to immutable.
init(_ from: UnsafeMutablePointer<Pointee>)
}
The following unsafe pointer conversions on Unsafe[Mutable]Pointer
members are removed:
UnsafeMutablePointer<Pointee> {
init<U>(_ from : UnsafeMutablePointer<U>)
init?<U>(_ from : UnsafeMutablePointer<U>?)
init<U>(_ from : UnsafePointer<U>)
init?<U>(_ from : UnsafePointer<U>?)
}
UnsafePointer<Pointee> {
init<U>(_ from : UnsafePointer<U>)
init?<U>(_ from : UnsafePointer<U>?)
}
UnsafeMutablePointer.deinitialize
now returns a raw pointer:
UnsafeMutablePointer<Pointee> {
func deinitialize(count: Int = 1) -> UnsafeMutableRawPointer
}
The following UnsafeMutablePointer
members are renamed:
extension UnsafeMutablePointer<Pointee> {
static func allocate(capacity: Int)
func deallocate(capacity: Int)
func initialize(to: Pointee, count: Int = 1)
func assign(from source: UnsafePointer<Pointee>, count: Int)
func moveInitialize(from source: ${Self}, count: Int)
func initialize(from source: UnsafePointer<Pointee>, count: Int)
func initialize<C : Collection>(from source: C)
func moveAssign(from source: ${Self}, count: Int)
}
The following UnsafeMutablePointer
members are removed:
extension UnsafeMutablePointer<Pointee> {
func assignBackwardFrom(_ source: UnsafePointer<Pointee>, count: Int)
func moveInitializeBackwardFrom(_ source: ${Self}, count: Int)
}
The largest impact of this change is that void*
and const void*
are imported as UnsafeMutableRawPointer
and
UnsafeRawPointer
. This impacts many public APIs, but with implicit
argument conversion should not affect typical uses of those APIs.
Any Swift projects that rely on type inference to convert between
UnsafePointer
types will need to take action. The developer needs to
determine whether type punning is necessary. If so, they must migrate
to the UnsafeRawPointer
API. Otherwise, they can work around the new
restriction by using bindMemory(to:capacity:)
,
assumingMemoryBound<T>(to)
, or adding a mutating
label to their
initializer.
The unsafeptr_convert branch contains an implementation of a previous design, which will soon be ported to the rawptr branch.
All occurrences of the type Unsafe[Mutable]Pointer<Void>
will be
automatically replaced with Unsafe[Mutable]RawPointer
.
Initialization of the form Unsafe[Mutable]Pointer
(p) will
automatically be replaced by Unsafe[Mutable]RawPointer(p)
whenever
the type checker determines that is the expression's expected type.
Conversion between incompatible Unsafe[Mutable]Pointer
values will
produce a diagnostic explaining asking the user to migrate to one of these forms:
Unsafe[Mutable]RawPointer($0).withMemoryRebound(to:capacity:)
Unsafe[Mutable]RawPointer($0).bindMemory(to:capacity:)
Unsafe[Mutable]RawPointer($0).assumingMemoryBound(to: T.self)
The following UnsafeMutablePointer methods:
initializeFrom(_: UnsafePointer<Pointee>, count: Int)
initializeBackwardFrom(_: UnsafePointer<Pointee>, count: Int)
assignFrom(_ source: Unsafe[Mutable]Pointer<Pointee>, count: Int)
moveAssignFrom(_ source: Unsafe[Mutable]Pointer<Pointee>, count: Int)
will be automatically converted to:
initialize(from: UnsafePointer<Pointee>, count: Int)
assign(from source: Unsafe[Mutable]Pointer<Pointee>, count: Int)
Disallowing inferred UnsafePointer
conversion requires some standard
library code to use an explicit .bindMemory(to:capacity:)
whenever the conversion may previously violate strict aliasing.
All occurrences of Unsafe[Mutable]Pointer<Void>
in the standard
library are converted to Unsafe[Mutable]RawPointer
. e.g. unsafeAddress()
now
returns UnsafeRawPointer
, not UnsafePointer<Void>
.
Some occurrences of Unsafe[Mutable]Pointer<Pointee>
in the standard
library are replaced with UnsafeRawPointer
, either because the code was
playing too loosely with strict aliasing rules, or because the code
actually wanted to perform pointer arithmetic on byte-addresses.
StringCore.baseAddress
changes from OpaquePointer
to
UnsafeMutableRawPointer
because it is computing byte offsets and
accessing the memory. OpaquePointer
is meant for bridging, but
should be truly opaque; that is, non-dereferenceable and not involved
in address computation.
The StringCore
implementation does a considerable amount of casting
between different views of the String
storage. For interoperability
and optimization, String buffers frequently need to be cast to and
from CChar
. This will be made safe by using bindMemory(to:capacity:)
.
CoreAudio
utilities now use Unsafe[Mutable]RawPointer
.
An unsafeptr_convert branch has the first prototype, named
UnsafeBytePointer
, and includes standard library and type system changes
listed below. A rawptr branch has the latest proposed
implementation of UnsafeRawPointer
. I am currently updating the
rawptr
branch to include the following changes.
There are a several things going on here in order to make it possible to build the standard library with the changes:
-
A new
UnsafeRawPointer
type is defined. -
The type system imports
void*
as UnsafeRawPointer. -
The type system handles implicit conversions to UnsafeRawPointer.
-
UnsafeRawPointer
replaces bothUnsafePointer<Void>
andUnsafeMutablePointer<Void>
(Recent feedback suggestes thatUnsafeMutablePointer
should also be introduced). -
The standard library was relying on inferred
UnsafePointer
conversion in over 100 places. Most of these conversions now either take an explicit label, such asmutating
or have been rewritten. -
Several places in the standard library that were playing loosely with strict aliasing or doing bytewise pointer arithmetic now use
UnsafeRawPointer
instead. -
Explicit labeled
Unsafe[Mutable]Pointer
initializers are added. -
The inferred
Unsafe[Mutable]Pointer
conversion is removed.
Remaining work:
-
A SIL-level builtin needs to be implemented for binding a region of memory.
-
A name mangled abbreviation needs to be created for
UnsafeRawPointer
. -
We may want a convenience utility for binding null-terminated string without providing a capacity.
-
The StringAPI tests should probably be rewritten with
UnsafeRawPointer
. -
The NSStringAPI utilities and tests may need to be ported to
UnsafeRawPointer
-
The CoreAudio utilities and tests may need to be ported to
UnsafeRawPointer
.
UnsafeRawPointer
should eventually support unaligned memory access. I
believe that we will eventually have a modifier that allows "packed"
struct members. At that time we may also want to add an "unaligned" flag to
UnsafeRawPointer
's load
and initialize
methods.
In some cases, developers can safely reinterpret values to achieve the same effect as type punning:
let ptrI32 = UnsafeMutablePointer<Int32>.allocate(capacity: 1)
ptrI32[0] = Int32()
let u = unsafeBitCast(ptrI32[0], to: UInt32.self)
Note that all access to the underlying memory is performed with the same element type. This is perfectly legitimate, but simply isn't a complete solution. It also does not eliminate the inherent danger in declaring a typed pointer and expecting it to point to values of a different type.
We considered adding a typePunnedMemory
property to the existing
Unsafe[Mutabale]Pointer
API. This would provide a legal way to
access a potentially type punned Unsafe[Mutabale]Pointer
. However,
it would certainly cause confusion without doing much to reduce
likelihood of programmer error. Furthermore, there are no good use
cases for such a property evident in the standard library.
The opaque _RawByte
struct is a technique that allows for
byte-addressable buffers while hiding the dangerous side effects of
type punning (a _RawByte
could be loaded but it's value cannot be
directly inspected). UnsafePointer<_RawByte>
is a clever alternative
to UnsafeRawPointer
. However, it doesn't do enough to prevent
undefined behavior. The loaded _RawByte
would naturally be accessed
via unsafeBitCast
, which would mislead the author into thinking that
they have legally bypassed the type system. In actuality, this API
blatantly violates strict aliasing. It theoretically results in
undefined behavior as it stands, and may actually exhibit undefined
behavior if the user recovers the loaded value.
To solve the safety problem with UnsafePointer<_RawByte>
, the
compiler could associate special semantics with a UnsafePointer
bound to this concrete generic parameter type. Statically enforcing
casting rules would be difficult if not impossible without new
language features. It would also be impossible to distinguish between
typed and untyped pointer APIs. For example,
UnsafePointer<T>.load<U>
would be a nonsensical vestige.
This first version of this proposal introduced an
UnsafeBytePointer
. UnsafeRawPointer
better conveys the type's role
with respect to uninitialized memory. The best way to introduce
UnsafeRawPointer
to users is by showing how it represents
uninitialized memory. It is the result of allocation, input to
initialization, and result of deinitialization. This helps users
understand the relationship between initializing memory and imbuing it
with a type.
Furthermore, we do not intend to allow direct access to the "bytes"
via subscript which would be implied by UnsafeBytePointer
.
Changing the imported type for void*
will be somewhat disruptive. We
could continue to import void*
as UnsafeMutablePointer<Void>
and
const void*
as UnsafePointer<Void>
, which will continue to serve
as an "opaque" untyped pointer. Converting to UnsafeRawPointer
would
be necessary to perform pointer arithmetic or to conservatively handle
possible type punning.
This alternative is much less disruptive, but we are left with two
forms of untyped pointer, one of which (UnsafePointer
) the type
system somewhat conflates with typed pointers.
There seems to be general agreement that UnsafeMutablePointer<Void>
is fundamentally the wrong way to represent untyped memory.
From a practical perspective, given the current restrictions of the
language, it's not clear how to statically enforce the necessary rules
for casting UnsafePointer<Void>
once general UnsafePointer<T>
conversions are disallowed. The following conversions should be
inferred, and implied for function arguments (ignoring mutability):
-
UnsafePointer<T>
toUnsafePointer<Void>
-
UnsafePointer<Void>
toUnsafeRawPointer
I did not implement this simpler design because my primary goal was to
enforce legal pointer conversion and rid Swift code of undefined
behavior. I can't do that while allowing UnsafePointer<Void>
conversions.
The general consensus now is that as long as we are making source
breaking changes to UnsafePointer
, we should try to shoot for an
overall better design that helps programmers understand the concepts.