Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article describes new features and performance improvements in the .NET runtime for .NET 10. It's updated for Preview 5.
Array interface method devirtualization
One of the focus areas for .NET 10 is to reduce the abstraction overhead of popular language features. In pursuit of this goal, the JIT's ability to devirtualize method calls has expanded to cover array interface methods.
Consider the typical approach of looping over an array:
static int Sum(int[] array)
{
int sum = 0;
for (int i = 0; i < array.Length; i++)
{
sum += array[i];
}
return sum;
}
This code shape is easy for the JIT to optimize, mainly because there aren't any virtual calls to reason about. Instead, the JIT can focus on removing bounds checks on the array access and applying the loop optimizations that were added in .NET 9. The following example adds some virtual calls:
static int Sum(int[] array)
{
int sum = 0;
IEnumerable<int> temp = array;
foreach (var num in temp)
{
sum += num;
}
return sum;
}
The type of the underlying collection is clear, and the JIT should be able to transform this snippet into the first one. However, array interfaces are implemented differently from "normal" interfaces, such that the JIT doesn't know how to devirtualize them. This means the enumerator calls in the foreach
loop remain virtual, blocking multiple optimizations such as inlining and stack allocation.
Starting in .NET 10, the JIT can devirtualize and inline array interface methods. This is the first of many steps to achieve performance parity between the implementations, as detailed in the .NET 10 de-abstraction plans.
Array enumeration de-abstraction
Efforts to reduce the abstraction overhead of array iteration via enumerators have improved the JIT's inlining, stack allocation, and loop cloning abilities. For example, the overhead of enumerating arrays via IEnumerable
is reduced, and conditional escape analysis now enables stack allocation of enumerators in certain scenarios.
Improved code layout
The JIT compiler in .NET 10 introduces a new approach to organizing method code into basic blocks for better runtime performance. Previously, the JIT used a reverse postorder (RPO) traversal of the program's flowgraph as an initial layout, followed by iterative transformations. While effective, this approach had limitations in modeling the trade-offs between reducing branching and increasing hot code density.
In .NET 10, the JIT models the block reordering problem as a reduction of the asymmetric Travelling Salesman Problem and implements the 3-opt heuristic to find a near-optimal traversal. This optimization improves hot path density and reduces branch distances, resulting in better runtime performance.
AVX10.2 support
.NET 10 introduces support for the Advanced Vector Extensions (AVX) 10.2 for x64-based processors. The new intrinsics available in the System.Runtime.Intrinsics.X86.Avx10v2 class can be tested once capable hardware is available.
Because AVX10.2-enabled hardware isn't yet available, the JIT's support for AVX10.2 is currently disabled by default.
Stack allocation
Stack allocation reduces the number of objects the GC has to track, and it also unlocks other optimizations. For example, after an object is stack-allocated, the JIT can consider replacing it entirely with its scalar values. Because of this, stack allocation is key to reducing the abstraction penalty of reference types. .NET 10 adds stack allocation for small arrays of value types and small arrays of reference types. It also includes escape analysis for local struct fields and delegates. (Objects that can't escape can be allocated on the stack.)
Small arrays of value types
The JIT now stack-allocates small, fixed-sized arrays of value types that don't contain GC pointers when they can be guaranteed not to outlive their parent method. In the following example, the JIT knows at compile time that numbers
is an array of only three integers that doesn't outlive a call to Sum
, and therefore allocates it on the stack.
static void Sum()
{
int[] numbers = {1, 2, 3};
int sum = 0;
for (int i = 0; i < numbers.Length; i++)
{
sum += numbers[i];
}
Console.WriteLine(sum);
}
Small arrays of reference types
.NET 10 extends the .NET 9 stack allocation improvements to small arrays of reference types. Previously, arrays of reference types were always allocated on the heap, even when their lifetime was scoped to a single method. Now, the JIT can stack-allocate such arrays when it determines that they don't outlive their creation context. In the following example, the array words
is now allocated on the stack.
static void Print()
{
string[] words = {"Hello", "World!"};
foreach (var str in words)
{
Console.WriteLine(str);
}
}
Escape analysis
Escape analysis determines if an object can outlive its parent method. Objects "escape" when assigned to non-local variables or passed to functions not inlined by the JIT. If an object can't escape, it can be allocated on the stack. .NET 10 includes escape analysis for:
Local struct fields
Starting in .NET 10, the JIT considers objects referenced by struct fields, which enables more stack allocations and reduces heap overhead. Consider the following example:
public class Program
{
struct GCStruct
{
public int[] arr;
}
public static void Main()
{
int[] x = new int[10];
GCStruct y = new GCStruct() { arr = x };
return y.arr[0];
}
}
Normally, the JIT stack-allocates small, fixed-sized arrays that don't escape, such as x
. Its assignment to y.arr
doesn't cause x
to escape, because y
doesn't escape either. However, the JIT's previous escape analysis implementation didn't model struct field references. In .NET 9, the x64 assembly generated for Main
includes a call to CORINFO_HELP_NEWARR_1_VC
to allocate x
on the heap, indicating it was marked as escaping:
Program:Main():int (FullOpts):
push rax
mov rdi, 0x719E28028A98 ; int[]
mov esi, 10
call CORINFO_HELP_NEWARR_1_VC
mov eax, dword ptr [rax+0x10]
add rsp, 8
ret
In .NET 10, the JIT no longer marks objects referenced by local struct fields as escaping, as long as the struct in question does not escape. The assembly now looks like this (notice that the heap allocation helper call is gone):
Program:Main():int (FullOpts):
sub rsp, 56
vxorps xmm8, xmm8, xmm8
vmovdqu ymmword ptr [rsp], ymm8
vmovdqa xmmword ptr [rsp+0x20], xmm8
xor eax, eax
mov qword ptr [rsp+0x30], rax
mov rax, 0x7F9FC16F8CC8 ; int[]
mov qword ptr [rsp], rax
lea rax, [rsp]
mov dword ptr [rax+0x08], 10
lea rax, [rsp]
mov eax, dword ptr [rax+0x10]
add rsp, 56
ret
For more information about de-abstraction improvements in .NET 10, see dotnet/runtime#108913.
Delegates
When source code is compiled to IL, each delegate is transformed into a closure class with a method corresponding to the delegate's definition and fields matching any captured variables. At run time, a closure object is created to instantiate the captured variables, along with a Func
object to invoke the delegate. If escape analysis determines the Func
object won't outlive its current scope, the JIT allocates it on the stack.
Consider the following Main
method:
public static int Main()
{
int local = 1;
int[] arr = new int[100];
var func = (int x) => x + local;
int sum = 0;
foreach (int num in arr)
{
sum += func(num);
}
return sum;
}
Previously, the JIT produces the following abbreviated x64 assembly for Main
. Before entering the loop, arr
, func
, and the closure class for func
called Program+<>c__DisplayClass0_0
are all allocated on the heap, as indicated by the CORINFO_HELP_NEW*
calls.
; prolog omitted for brevity
mov rdi, 0x7DD0AE362E28 ; Program+<>c__DisplayClass0_0
call CORINFO_HELP_NEWSFAST
mov rbx, rax
mov dword ptr [rbx+0x08], 1
mov rdi, 0x7DD0AE268A98 ; int[]
mov esi, 100
call CORINFO_HELP_NEWARR_1_VC
mov r15, rax
mov rdi, 0x7DD0AE4A9C58 ; System.Func`2[int,int]
call CORINFO_HELP_NEWSFAST
mov r14, rax
lea rdi, bword ptr [r14+0x08]
mov rsi, rbx
call CORINFO_HELP_ASSIGN_REF
mov rsi, 0x7DD0AE461140 ; code for Program+<>c__DisplayClass0_0:<Main>b__0(int):int:this
mov qword ptr [r14+0x18], rsi
xor ebx, ebx
add r15, 16
mov r13d, 100
G_M24375_IG03: ;; offset=0x0075
mov esi, dword ptr [r15]
mov rdi, gword ptr [r14+0x08]
call [r14+0x18]System.Func`2[int,int]:Invoke(int):int:this
add ebx, eax
add r15, 4
dec r13d
jne SHORT G_M24375_IG03
; epilog omitted for brevity
Now, because func
is never referenced outside the scope of Main
, it's also allocated on the stack:
; prolog omitted for brevity
mov rdi, 0x7B52F7837958 ; Program+<>c__DisplayClass0_0
call CORINFO_HELP_NEWSFAST
mov rbx, rax
mov dword ptr [rbx+0x08], 1
mov rsi, 0x7B52F7718CC8 ; int[]
mov qword ptr [rbp-0x1C0], rsi
lea rsi, [rbp-0x1C0]
mov dword ptr [rsi+0x08], 100
lea r15, [rbp-0x1C0]
xor r14d, r14d
add r15, 16
mov r13d, 100
G_M24375_IG03: ;; offset=0x0099
mov esi, dword ptr [r15]
mov rdi, rbx
mov rax, 0x7B52F7901638 ; address of definition for "func"
call rax
add r14d, eax
add r15, 4
dec r13d
jne SHORT G_M24375_IG03
; epilog omitted for brevity
Notice there is one remaining CORINFO_HELP_NEW*
call, which is the heap allocation for the closure. The runtime team plans to expand escape analysis to support stack allocation of closures in a future release.
Inlining improvements
Various inlining improvements have been made in .NET 10.
The JIT can now inline methods that become eligible for devirtualization due to previous inlining. This improvement allows the JIT to uncover more optimization opportunities, such as further inlining and devirtualization.
Some methods that have exception-handling semantics, in particular those with try-finally
blocks, can also be inlined.
To better take advantage of the JIT's ability to stack-allocate some arrays, the inliner's heuristics have been adjusted to increase the profitability of candidates that might be returning small, fixed-sized arrays.
Return types
During inlining, the JIT now updates the type of temporary variables that hold return values. If all return sites in a callee yield the same type, this precise type information is used to devirtualize subsequent calls. This enhancement complements the improvements in late devirtualization and array enumeration de-abstraction.
Profile data
.NET 10 enhances the JIT's inlining policy to take better advantage of profile data. Among numerous heuristics, the JIT's inliner doesn't consider methods over a certain size to avoid bloating the caller method. When the caller has profile data that suggests an inlining candidate is frequently executed, the inliner increases its size tolerance for the candidate.
Suppose the JIT inlines some callee Callee
without profile data into some caller Caller
with profile data. This discrepancy can occur if the callee is too small to be worth instrumenting, or if it's inlined too often to have a sufficient call count. If Callee
has its own inlining candidates, the JIT previously didn't consider them with its default size limit due to Callee
lacking profile data. Now, the JIT will realize Caller
has profile data and loosen its size restriction (but, to account for loss of precision, not to the same degree as if Callee
had profile data).
Similarly, when the JIT decides a call site isn't profitable for inlining, it marks the method with NoInlining
to save future inlining attempts from considering it. However, many inlining heuristics are sensitive to profile data. For example, the JIT might decide a method is too large to be worth inlining in the absence of profile data. But when the caller is sufficiently hot, the JIT might be willing to relax its size restriction and inline the call. In .NET 10, the JIT no longer flags unprofitable inlinees with NoInlining
to avoid pessimizing call sites with profile data.
NativeAOT type preinitializer improvements
NativeAOT's type preinitializer now supports all variants of the conv.*
and neg
opcodes. This enhancement allows preinitialization of methods that include casting or negation operations, further optimizing runtime performance.
Arm64 write-barrier improvements
.NET's garbage collector (GC) is generational, meaning it separates live objects by age to improve collection performance. The GC collects younger generations more often under the assumption that long-lived objects are less likely to be unreferenced (or "dead") at any given time. However, suppose an old object starts referencing a young object; the GC needs to know it can't collect the young object. However, needing to scan older objects to collect a young object defeats the performance gains of a generational GC.
To solve this problem, the JIT inserts write barriers before object reference updates to keep the GC informed. On x64, the runtime can dynamically switch between write-barrier implementations to balance write speeds and collection efficiency, depending on the GC's configuration. In .NET 10, this functionality is also available on Arm64. In particular, the new default write-barrier implementation on Arm64 handles GC regions more precisely, which improves collection performance at a slight cost to write-barrier throughput. Benchmarks show GC pause improvements from 8% to over 20% with the new GC defaults.