Our High Performance C# (HPC#) compiler technology, Burst, just keeps getting better. That's why we want to unpack some major improvements made in the most recent version, Burst 1.5. In this post, we'll take you through the headline features and their benefits, so you can make the most of Burst in your projects.

Arm Neon hardware intrinsics

In collaboration with our partners atArm, we've addedArm Neon hardware intrinsicsto Burst 1.5. These allow you to target the specific hardware instructions available on Arm platforms, including the amazing vector technology, Neon, in all its glory.

Arm Neon intrinsics were first introduced as an experimental feature in Burst 1.4, and are now fully supported in Burst 1.5. Burst currently includes all Armv8-A intrinsics, with Armv8.1-RDMA, Armv8.2-DotProd and Armv8.2-Crypto intrinsics as experimental features of Burst 1.5 that will be fully supported in the next Burst version.

Arm Neon intrinsics make use of the v128type, familiar from Intel intrinsics, as well as the v64type. These types comprise bags of 128 or 64 bits, respectively. It's up to you to verify and correctly treat the vector element type and count. After all, these vectors are a representation of the actual SIMD register on the CPU.

See a simple usage example:

[BurstCompile] static float ComputeSum(float* arr, int count) { // For simplicity sake assuming that the length of the data is a multiple of 4 // Make sure to handle any remainder properly in production code Assert.IsTrue(count % 4 == 0); if (IsNeonSupported) { // To sum up all values in the array, we split the array into 4 subarrays // and store their sums in the variable `sum` below v128 sum = new v128(0f); for (int i = 0; i < count; i += 4) { // Load 4 floats from memory v128 reg = vld1q_f32(arr + i); sum = vaddq_f32(sum, reg); } return vaddvq_f32(sum); } else { // Fallback: managed implementation float sum = 0; for (int i = 0; i < count; i++) sum += arr[i]; return sum; } }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[BurstCompile]
static float ComputeSum(float* arr, int count)
{
// For simplicity sake assuming that the length of the data is a multiple of 4
// Make sure to handle any remainder properly in production code
Assert.IsTrue(count % 4 == 0);
if (IsNeonSupported)
{
// To sum up all values in the array, we split the array into 4 subarrays
// and store their sums in the variable `sum` below
v128 sum = new v128(0f);
for (int i = 0; i< count; i += 4)
{
// Load 4 floats from memory
v128 reg = vld1q_f32(arr + i);
sum = vaddq_f32(sum, reg);
}
return vaddvq_f32(sum);
}
else
{
// Fallback: managed implementation
float sum = 0;
for (int i = 0; i< count; i++)
sum += arr[i];
return sum;
}
}

Keep in mind that the IsNeonSupportedvalue is evaluated at compile time based on your target CPU, so it doesn't affect the runtime performance. If you want to provide multiple intrinsics implementations for Arm and Intel target CPUs, you should include more of the IsXXXSupportedblocks in your code.

It is important to consider that Neon intrinsics are only supported on Armv8-A hardware (64-bit). On Armv7-A (32-bit), IsNeonSupportedwill always be false. If you target older 32-bit Arm devices, you can still rely on Burst to optimize your managed code automatically, without using Neon intrinsics directly.

We'll follow up on Arm intrinsics and share further details on Neon intrinsics in a subsequent blog.

Hardware intrinsics target advanced users who want to harness the absolute maximum performance from the compiler, and seek to fine-tune their code to squeeze down a few more CPU cycles. If you take on this challenge, we'd love to hear your feedback.

Direct Call

A prominent new feature in Burst 1.5 is what we refer to as Direct Call. With Burst, we began focusing on jobs to accelerate tasks that run on Unity's job system with our HPC# compiler. We then added function pointers, so you can manage and call into bits of Burst code from just about anywhere:

[BurstCompile] public class MyClass { [BurstCompile] public static float DoSomething(float f) => math.sin(f); public delegate float DoSomethingDelegate(float f); public static float SomeManagedCode(float f) { var funcPtr = BurstCompiler.CompileFunctionPointer( DoSomething); return funcPtr.Invoke(f); } }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[BurstCompile]
public class MyClass
{
[BurstCompile]
public static float DoSomething(float f) => math.sin(f);
public delegate float DoSomethingDelegate(float f);
public static float SomeManagedCode(float f)
{
var funcPtr = BurstCompiler.CompileFunctionPointer<DoSomethingDelegate>(
DoSomething);
return funcPtr.Invoke(f);
}
}

While this greatly enhanced Burst's performance, we realized that this mechanism was clunky to use from places outside the job system. With Direct Call, Burst carries out this transformation for you, meaning that you just have to complete the following:

[BurstCompile] public class MyClass { [BurstCompile] public static float DoSomething(float f) => math.sin(f); public static float SomeManagedCode(float f) { return DoSomething(f); } }
1
2
3
4
5
6
7
8
9
10
11
[BurstCompile]
public class MyClass
{
[BurstCompile]
public static float DoSomething(float f) => math.sin(f);
public static float SomeManagedCode(float f)
{
return DoSomething(f);
}
}

The code proceeds to run through Burst. Note that Direct Callmethods only work this way (as shown above) when called from the main thread.

New optimization superpowers

In Burst 1.5, we've added ample new and interesting functionalities to give you some extra optimization superpowers.

Hint.Likely, Hint.Unlikely and Hint.Assume

One key request that has continued to come up focuses on the use of intrinsics to inform the compiler whether something is likely or unlikely to happen. In Burst 1.5, we've added two new intrinsics to Unity.Burst.CompilerServices - Likely and Unlikely:

if (Likely(x == 0)) { // Do something likely to happen! } if (Unlikely(x != 0)) { // Do something unlikely to happen! }
1
2
3
4
5
6
7
8
9
if (Likely(x == 0))
{
// Do something likely to happen!
}
if (Unlikely(x != 0))
{
// Do something unlikely to happen!
}

These intrinsics enable you to tell the compiler whether some boolean condition (like the condition of an 'if' branch) is either likely or unlikely to be hit. This allows the compiler to optimize the resulting code.

We've also added an Assume intrinsic:

Assume(x == 0); if (x > 0) { // This branch will be removed because x is 0! }
1
2
3
4
5
6
Assume(x == 0);
if (x > 0)
{
// This branch will be removed because x is 0!
}

This intrinsic informs the compiler of certain trends that will always occur. For instance, you can use Assume to tell the compiler that a pointer is never null, an index is never negative, a value is never NaN, and so forth. Be careful though - the compiler won't check if your Assume is actually valid, so please ensure that your assumptions are actually true.

IsConstantExpression

We've also added an intrinsic to query whether an expression evaluates to a constant expression at compile time:

if (!IsConstantExpression(x)) { throw new Exception('x isn't constant!'); }
1
2
3
4
if (!IsConstantExpression(x))
{
throw new Exception('x isn't constant!');
}

This query can be used as shown above, to ensure that some value is constant. Otherwise, it can be used in algorithms with faster paths, if, for example, something is definitely not NaN or null.

[SkipLocalsInit]

In C#, all local variables are zero initialized by default. Sometimes developers want to skip the cost of doing this zero initialization, so we added an attribute [SkipLocalsInit]to do just that. Simply apply this attribute to any function that you don't want to have the zero initialization happen on. This mirrors .NET 5'sSkipLocalsInitAttributefunctionality, but brings it to Burst sooner.

Miscellaneous improvements

Check out these smaller but equally awesome improvements in 1.5, in no particular order:

  • Burst now supports ValueTuple structures (int, float) within Bursted code - so long as types don't stray across entry-point boundaries. For example, you can't store them in a job struct or return them from a function pointer.
  • We added Bmi1 and Bmi2 x86 intrinsics to Burst 1.5 - gating them on AVX2 support. Any CPU that has AVX2 support can now make use of these incredible bit manipulation instructions directly in their code.
  • In Unity 2020.2 or newer versions, you can now call new ProfilerMarker('MarkerName')from Bursted code.
  • We also re-enabled the Burst warning BC1370, exclusively in player builds. This warning tells you where throws appear in a function unguarded by [Conditional('ENABLE_UNITY_COLLECTIONS_CHECKS')]- which isn't supported in player builds.
  • Finally, there is a whole slew of performance improvements surrounding the use of LLVM 11 as our default code generator, along with optimizations for stackalloc hoisting, dead loop removal, compile time improvements and much more.
One last dance for 2018.4

Burst 1.5 is the last version to support Unity 2018.4. Our next version will have a minimum requirement of Unity 2019.4.

Start using Burst

Burst is a core part of our technology stack that you can start using today. It is a stable and verified package, already employed in thousands of projects, and counting. While our DOTS technology stack leverages Burst to provide highly optimized code, Burst also serves as a stand-alone package outside of DOTS.

It supports all the major desktop, console and mobile platforms, and works with Unity 2018.4 or newer.

If you have any thoughts, questions, or would just like to let us know what you are doing with Burst, then please feel free to leave us a message on theBurst forum.

Attachments

  • Original document
  • Permalink

Disclaimer

Unity Software Inc. published this content on 14 April 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 14 April 2021 13:02:04 UTC.