October 11, 2024
By: Kevin

高性能c#编程

  1. 写在前面
  2. 什么是性能?
    1. 执行时间
    2. 吞吐量
    3. 内存分配
    4. 性能有上下文
  3. 性能与清晰的权衡
  4. 度量性能
    1. 性能优化是一个循环过程
    2. 应用程序性能测量工具
    3. 基准测试(Benchmarking)
  5. Span
    1. 切片(Slicing)
    2. 使用 Span来做性能优化
    3. Span 的限制
  6. Memory
  7. ArrayPool
  8. System.IO.Pipelines
  9. System.Text.Json
  10. 总结
  11. 参考资料

写在前面

比较喜欢折腾性能问题, 因为对性能有要求的场景普遍比较有趣, 无论是协议的编解码, IO, 数据处理, 科学计算(机器学习).

而且性能问题从头到尾都有指向性明确, 结论清晰的特点, 解决问题过程中多有曲径通幽, 智商再次占领高地的快感.

  1. 如果把性能优化的百分比对应快乐的话, 经常有万分的快乐.
  2. 而且惊喜多多, profile的过程每每出人意表.
  3. 可以和以前的经验多有参照, c#对比Java, c++对比rust, 总会给人很多灵感, 加深对语言机制, 编译器的认识.
  4. 有贴近硬件的冷冰冰的舒适感.

本文主要介绍C#语言上的高性能编程工具, 以及其上的性能衡量工具 BenchmarkDotNet.

和Java一样, C#也是一门JIT语言, 在执行期间有额外的复杂性(需要预热, 有GC), BenchmarkDotNet会给我们在以下方面带来很大的帮助:

  • 高精度和可靠性: 通过多次运行和统计分析确保结果的准确性.
  • 自动化和便捷性: 简化基准测试的编写和配置过程.
  • 全面的性能数据: 提供详细的性能报告和环境信息.

当然, 从软件工程发角度, Donald Knuth的话永不过时:

Premature Optimization Is the Root of All Evil – Donald Knuth

什么是性能?

当我们谈论应用程序代码的性能时, 通常指从以下三个角度来衡量, 即执行时间, 吞吐量和内存消耗.

执行时间

执行时间是指代码执行的速度, 即方法或整个过程所需的时间. 可能以几秒,毫秒甚至纳秒来衡量. 代码运行得更快, 用户体验会更好, 而且可以在同一台机器上完成更多的工作.

对执行时间, 我们需要有直观的认识, 这是纳秒和微秒统治的领域.

CPU处理指令是纳秒计, 而人类感知的时间单位一般是以秒计. 两者天差地别.

举例来说, 如果将 一纳秒等于一秒, 我们可以通过以下步骤计算 一秒 在这种比例下相当于多少年:

现实中, 1 秒 = 1,000,000,000 纳秒(10⁹ 纳秒).

将秒转换为年:

1 年 ≈ 31,536,000 秒(60秒 × 60分钟 × 24小时 × 365天).

因此, 10⁹ 秒 ≈ 10⁹ ÷ 31,536,000 ≈ 31.7 年.

我们的三秒, 就是CPU的一个世纪!

吞吐量

吞吐量指的是应用程序在给定时间内可以完成的工作量. 例如, 在ASP.NET Core应用程序中, 每秒请求数(Requests per Second)就是一种吞吐量的衡量方式. 减少代码的执行时间, 大多数情况可以提高吞吐量.

在ASP.net时代的早期(4.x以前), 服务端处理web请求是线程模型.

每个进入的 HTTP 请求通常会分配一个线程池中的线程来处理. 这种 线程-每-请求(Thread-Per-Request) 的模型具有以下特点:

  • 同步处理: 大多数操作是同步进行的, 意味着线程在等待 I/O 操作(如数据库查询, 文件读取)完成时会被阻塞, 无法处理其他请求.
  • 线程池依赖: 依赖 .NET 的线程池来管理线程的分配和回收, 但在高并发场景下, 线程池可能难以及时扩展以满足大量请求, 导致延迟增加或资源耗尽.
  • C10k 问题影响: 由于每个请求占用一个线程, 当并发连接数达到数千甚至更多时, 系统会面临大量线程上下文切换, 高内存消耗和资源竞争等问题, 影响整体性能和响应时间.

在线程的实现上, c#和Java非常类似, 一个线程对应着以下开销

  1. 平均1ms的创建时间
  2. 2m的栈消耗
  3. 最长1ms的上下文切换时间(这个时间决定了10k问题).

所以一个操作系统实际上最大线程数在4k~5k.

对于web领域的吞吐量, 了解系统的现代IO机制, 比如(epoll, kqueue, IOCP), 编译器的异步机制(async/await)会很有帮助.

内存分配

在.NET中, 内存分配是低成本的, 因为有运行时(runtime)管理预先分配的堆内存. 然而, 当对象不再使用时, 垃圾回收器(GC)需要运行来回收内存.

虽然GC经过高度优化, 但它并非全无代价, 在高频率的应用程序中, GC暂停可能会影响性能.

在音视频实时应用, 高性能的数据处理, 或者网络编程的情况下可能会成为瓶颈, 不当的数据类型(boxed list 而不是 Span), GC可能会占据大部分系统的CPU处理时间.

性能有上下文

讨论性能不能脱离场景. 并不是所有的应用程序都需要相同的性能优化.

举例来说, 信息系统 因为要涉及到外部接口调用和数据库操作, 从 响应时间 的角度, 几乎不用考虑代码的执行效率和内存分配.

又比如说, 一次性执行 的文本分析程序比如grep. 甚至不用考虑内存泄漏的问题, 因为它大概率10s内就结束了, 放弃内存回收会甚至是一种优化.

一些我们将讨论的技术可能不适用于大多数应用程序的代码. 但在有高性能要求的系统中, 这些优化才会真正发挥其价值.

性能与清晰的权衡

在某些情况下, 性能优化可能会导致代码的可读性下降, 维护起来更困难. 因此, 需要在性能和代码可维护性之间做出权衡.

  • 位运算(<< >>)在硬件指令层执行比乘除法(* /)要快, 但是我们还是希望用我们熟悉的运算符.

    using System;
    using System.Diagnostics;
    
    
    int[] numbers = new int[100000000];
    for(int i = 0; i < numbers.Length; i++)
    {
        numbers[i] = i;
    }
    
    Stopwatch sw = Stopwatch.StartNew();
    
    long sum = 0;
    for(int i = 0; i < numbers.Length; i++)
    {
        // 使用位移操作进行乘法和除法
        int multiplied = numbers[i] << 1; // 相当于 numbers[i] * 2
        int divided = numbers[i] >> 1;    // 相当于 numbers[i] / 2
        sum += multiplied + divided;
    }
    sw.Stop();
    Console.WriteLine($"位运算结果: {sum}, 时间: {sw.ElapsedMilliseconds} ms");
    
    位运算结果: 12499999850000000, 时间: 535 ms
    
    using System;
    using System.Diagnostics;
    
    
    int[] numbers = new int[100000000];
    for(int i = 0; i < numbers.Length; i++)
    {
        numbers[i] = i;
    }
    
    Stopwatch sw = Stopwatch.StartNew();
    
    long sum = 0;
    for(int i = 0; i < numbers.Length; i++)
    {
        // 使用乘法和除法运算符
        int multiplied = numbers[i] * 2;
        int divided = numbers[i] / 2;
        sum += multiplied + divided;
    }
    
    sw.Stop();
    Console.WriteLine($"乘除法结果: {sum}, 时间: {sw.ElapsedMilliseconds} ms");
    
    乘除法结果: 12499999850000000, 时间: 578 ms
    
  • 函数调用也是有代价的, 但是我们还是要抽取函数.

    using System;
    using System.Diagnostics;
    
    
    // 定义一个计算平方的函数
    static int Square(int n)
    {
        return n * n;
    }
    
    
    int[] numbers = new int[100000000];
    for(int i = 0; i < numbers.Length; i++)
    {
        numbers[i] = i;
    }
    
    Stopwatch sw = Stopwatch.StartNew();
    
    long sum = 0;
    for(int i = 0; i < numbers.Length; i++)
    {
        sum += Square(numbers[i]);
    }
    
    sw.Stop();
    Console.WriteLine($"函数调用结果: {sum}, 时间: {sw.ElapsedMilliseconds} ms");
    
    函数调用结果: 20047455266176, 时间: 594 ms
    
    using System;
    using System.Diagnostics;
    
    int[] numbers = new int[100000000];
    for(int i = 0; i < numbers.Length; i++)
    {
        numbers[i] = i;
    }
    
    Stopwatch sw = Stopwatch.StartNew();
    
    long sum = 0;
    for(int i = 0; i < numbers.Length; i++)
    {
        // 直接计算平方, 避免函数调用
        int squared = numbers[i] * numbers[i];
        sum += squared;
    }
    
    sw.Stop();
    Console.WriteLine($"代码展开结果: {sum}, 时间: {sw.ElapsedMilliseconds} ms");
    
    代码展开结果: 20047455266176, 时间: 498 ms
    
  • 异常有性能消耗, 但它是标准的错误处理方法

    using System;
    using System.Diagnostics;
    // 使用异常处理的方法
    static void ValidateWithException(int age)
    {
        if (age < 0 || age > 120)
        {
            throw new ArgumentOutOfRangeException(nameof(age), "Age must be between 0 and 120.");
        }
    }
    
    // 使用错误返回值的方法
    static bool ValidateWithReturnValue(int age, out string error)
    {
        if (age < 0 || age > 120)
        {
            error = "Age must be between 0 and 120.";
            return false;
        }
    
        error = null;
        return true;
    }
    
    
    // 初始化数据
    int totalIterations = 1000000; // 100 万次
    int[] ages = new int[totalIterations];
    Random rand = new Random();
    
    for (int i = 0; i < ages.Length; i++)
    {
        // 生成随机年龄, 10% 的概率生成无效年龄
        ages[i] = rand.Next(0, 130); // 0 到 129 之间
    }
    
    // 使用异常处理的基准测试
    Stopwatch swException = Stopwatch.StartNew();
    long sumException = 0;
    int exceptionCount = 0;
    
    for (int i = 0; i < ages.Length; i++)
    {
        try
        {
            ValidateWithException(ages[i]);
            sumException += ages[i] * ages[i]; // 有效年龄的平方
        }
        catch
        {
            exceptionCount++;
            // 可选: 记录错误信息
        }
    }
    
    swException.Stop();
    Console.WriteLine("=== 使用异常处理 ===");
    Console.WriteLine($"总和: {sumException}");
    Console.WriteLine($"异常次数: {exceptionCount}");
    Console.WriteLine($"时间: {swException.ElapsedMilliseconds} ms\n");
    
    // 使用错误返回值的基准测试
    Stopwatch swReturn = Stopwatch.StartNew();
    long sumReturn = 0;
    int returnCount = 0;
    
    for (int i = 0; i < ages.Length; i++)
    {
        if (ValidateWithReturnValue(ages[i], out _))
        {
            sumReturn += ages[i] * ages[i]; // 有效年龄的平方
        }
        else
        {
            returnCount++;
            // 可选: 记录错误信息
        }
    }
    
    swReturn.Stop();
    Console.WriteLine("=== 使用错误返回值 ===");
    Console.WriteLine($"总和: {sumReturn}");
    Console.WriteLine($"错误次数: {returnCount}");
    Console.WriteLine($"时间: {swReturn.ElapsedMilliseconds} ms");
    
    === 使用异常处理 ===
    总和: 4495711993
    异常次数: 69011
    时间: 1488 ms
    
    === 使用错误返回值 ===
    总和: 4495711993
    错误次数: 69011
    时间: 8 ms
    

度量性能

性能优化是一个循环过程

@startuml
!define RECTANGLE class

skinparam monochrome true
skinparam ArrowColor #000000
skinparam Rectangle {
  BackgroundColor #f0f0f0
  BorderColor #333333
  RoundCorner 20
  Shadowing false
  FontSize 14
  FontName Arial
}

title 性能优化循环过程

' 定义各个阶段
rectangle "测量\nMeasure" as Measure
rectangle "分析\nAnalyze" as Analyze
rectangle "优化\nOptimize" as Optimize
rectangle "验证\nVerify" as Verify

' 布局各个阶段形成环状
Measure -right-> Analyze : 步骤1
Analyze -down-> Optimize : 步骤2
Optimize -left-> Verify : 步骤3
Verify -up-> Measure : 步骤4

@enduml
  1. 测量: 在优化之前, 首先测量当前性能, 找出热点代码路径.
  2. 分析: 使用基准测试等工具对代码进行深入分析.
  3. 优化: 进行小的, 可测量的优化.
  4. 验证: 再次测量, 验证优化效果.

应用程序性能测量工具

  • Visual Studio 诊断工具: 用于查看GC的发生, 堆内存使用等信息.
  • Profiling工具: 如Visual Studio自带的Profiling工具, PerfView, JetBrains的dotTrace和dotMemory.
  • IL代码查看器: 如ILSpy, 可以查看编译后的中间语言代码.
  • 生产环境监控: 通过监控生产环境中的指标, 如每秒请求数, 堆内存使用等, 来评估优化效果.

基准测试(Benchmarking)

为了精确测量代码性能, 我们使用 BenchmarkDotNet.

基准测试示例

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

using System;

public class NameParser
{
    public string GetLastName(string fullName)
    {
        return fullName.Substring(fullName.LastIndexOf(' ') + 1);
    }
}

[ShortRunJob]
public class Benchmarks
{
    private NameParser _nameParser = new NameParser();
    private string _fullName = "John Doe";

    [Benchmark]
    public string GetLastName()
    {
        return _nameParser.GetLastName(_fullName);
    }
}

public class Program
{
    public static void Main(string[] args)
    {
        BenchmarkRunner.Run<Benchmarks>();
    }
}
// Validating benchmarks:
// ***** BenchmarkRunner: Start   *****
// ***** Found 1 benchmark(s) in total *****
// ***** Building 1 exe(s) in Parallel: Start   *****
// start dotnet  restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true /p:IntermediateOutputPath="/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-pcot6J/bin/Release/net8.0/c2b3ea75-d340-48ce-83a6-9ced765c0c11/obj/Release/net8.0/" /p:OutDir="/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-pcot6J/bin/Release/net8.0/c2b3ea75-d340-48ce-83a6-9ced765c0c11/bin/Release/net8.0/" /p:OutputPath="/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-pcot6J/bin/Release/net8.0/c2b3ea75-d340-48ce-83a6-9ced765c0c11/bin/Release/net8.0/" in /private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-pcot6J/bin/Release/net8.0/c2b3ea75-d340-48ce-83a6-9ced765c0c11
// command took 1.25 sec and exited with 0
// start dotnet  build -c Release --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true /p:IntermediateOutputPath="/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-pcot6J/bin/Release/net8.0/c2b3ea75-d340-48ce-83a6-9ced765c0c11/obj/Release/net8.0/" /p:OutDir="/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-pcot6J/bin/Release/net8.0/c2b3ea75-d340-48ce-83a6-9ced765c0c11/bin/Release/net8.0/" /p:OutputPath="/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-pcot6J/bin/Release/net8.0/c2b3ea75-d340-48ce-83a6-9ced765c0c11/bin/Release/net8.0/" --output "/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-pcot6J/bin/Release/net8.0/c2b3ea75-d340-48ce-83a6-9ced765c0c11/bin/Release/net8.0/" in /private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-pcot6J/bin/Release/net8.0/c2b3ea75-d340-48ce-83a6-9ced765c0c11
// command took 5.46 sec and exited with 0
// ***** Done, took 00:00:06 (6.79 sec)   *****
// Found 1 benchmarks:
//   Benchmarks.GetLastName: ShortRun(IterationCount=3, LaunchCount=1, WarmupCount=3)

// **************************
// Benchmark: Benchmarks.GetLastName: ShortRun(IterationCount=3, LaunchCount=1, WarmupCount=3)
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet c2b3ea75-d340-48ce-83a6-9ced765c0c11.dll --anonymousPipes 108 109 --benchmarkName Benchmarks.GetLastName --job ShortRun --benchmarkId 0 in /private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-pcot6J/bin/Release/net8.0/c2b3ea75-d340-48ce-83a6-9ced765c0c11/bin/Release/net8.0
// Failed to set up high priority (Permission denied). In order to run benchmarks with high priority, make sure you have the right permissions.
// BeforeAnythingElse

// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX2,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: ShortRun(IterationCount=3, LaunchCount=1, WarmupCount=3)

OverheadJitting  1: 1 op, 320324.00 ns, 320.3240 us/op
WorkloadJitting  1: 1 op, 428967.00 ns, 428.9670 us/op

OverheadJitting  2: 16 op, 465706.00 ns, 29.1066 us/op
WorkloadJitting  2: 16 op, 480973.00 ns, 30.0608 us/op

WorkloadPilot    1: 16 op, 3704.00 ns, 231.5000 ns/op
WorkloadPilot    2: 32 op, 2175.00 ns, 67.9688 ns/op
WorkloadPilot    3: 64 op, 2763.00 ns, 43.1719 ns/op
WorkloadPilot    4: 128 op, 5795.00 ns, 45.2734 ns/op
WorkloadPilot    5: 256 op, 7921.00 ns, 30.9414 ns/op
WorkloadPilot    6: 512 op, 19338.00 ns, 37.7695 ns/op
WorkloadPilot    7: 1024 op, 98064.00 ns, 95.7656 ns/op
WorkloadPilot    8: 2048 op, 52599.00 ns, 25.6831 ns/op
WorkloadPilot    9: 4096 op, 107757.00 ns, 26.3079 ns/op
WorkloadPilot   10: 8192 op, 187872.00 ns, 22.9336 ns/op
WorkloadPilot   11: 16384 op, 710234.00 ns, 43.3492 ns/op
WorkloadPilot   12: 32768 op, 1208381.00 ns, 36.8769 ns/op
WorkloadPilot   13: 65536 op, 2101086.00 ns, 32.0600 ns/op
WorkloadPilot   14: 131072 op, 4117422.00 ns, 31.4134 ns/op
WorkloadPilot   15: 262144 op, 7140705.00 ns, 27.2396 ns/op
WorkloadPilot   16: 524288 op, 30298467.00 ns, 57.7897 ns/op
WorkloadPilot   17: 1048576 op, 20174450.00 ns, 19.2399 ns/op
WorkloadPilot   18: 2097152 op, 43940774.00 ns, 20.9526 ns/op
WorkloadPilot   19: 4194304 op, 54584561.00 ns, 13.0140 ns/op
WorkloadPilot   20: 8388608 op, 108292372.00 ns, 12.9095 ns/op
WorkloadPilot   21: 16777216 op, 207294161.00 ns, 12.3557 ns/op
WorkloadPilot   22: 33554432 op, 419881856.00 ns, 12.5135 ns/op
WorkloadPilot   23: 67108864 op, 955729862.00 ns, 14.2415 ns/op

OverheadWarmup   1: 67108864 op, 217409339.00 ns, 3.2397 ns/op
OverheadWarmup   2: 67108864 op, 204713421.00 ns, 3.0505 ns/op
OverheadWarmup   3: 67108864 op, 188279613.00 ns, 2.8056 ns/op
OverheadWarmup   4: 67108864 op, 184621021.00 ns, 2.7511 ns/op
OverheadWarmup   5: 67108864 op, 183637956.00 ns, 2.7364 ns/op
OverheadWarmup   6: 67108864 op, 193078308.00 ns, 2.8771 ns/op
OverheadWarmup   7: 67108864 op, 197619761.00 ns, 2.9448 ns/op
OverheadWarmup   8: 67108864 op, 192635043.00 ns, 2.8705 ns/op
OverheadWarmup   9: 67108864 op, 202886111.00 ns, 3.0232 ns/op
OverheadWarmup  10: 67108864 op, 186367799.00 ns, 2.7771 ns/op

OverheadActual   1: 67108864 op, 185629728.00 ns, 2.7661 ns/op
OverheadActual   2: 67108864 op, 185566116.00 ns, 2.7652 ns/op
OverheadActual   3: 67108864 op, 187526736.00 ns, 2.7944 ns/op
OverheadActual   4: 67108864 op, 185825169.00 ns, 2.7690 ns/op
OverheadActual   5: 67108864 op, 187154219.00 ns, 2.7888 ns/op
OverheadActual   6: 67108864 op, 185267411.00 ns, 2.7607 ns/op
OverheadActual   7: 67108864 op, 184511678.00 ns, 2.7494 ns/op
OverheadActual   8: 67108864 op, 185831484.00 ns, 2.7691 ns/op
OverheadActual   9: 67108864 op, 185464436.00 ns, 2.7636 ns/op
OverheadActual  10: 67108864 op, 184191681.00 ns, 2.7447 ns/op
OverheadActual  11: 67108864 op, 185644511.00 ns, 2.7663 ns/op
OverheadActual  12: 67108864 op, 184538369.00 ns, 2.7498 ns/op
OverheadActual  13: 67108864 op, 184283688.00 ns, 2.7460 ns/op
OverheadActual  14: 67108864 op, 187138889.00 ns, 2.7886 ns/op
OverheadActual  15: 67108864 op, 189052876.00 ns, 2.8171 ns/op

WorkloadWarmup   1: 67108864 op, 846719159.00 ns, 12.6171 ns/op
WorkloadWarmup   2: 67108864 op, 846434924.00 ns, 12.6129 ns/op
WorkloadWarmup   3: 67108864 op, 821792703.00 ns, 12.2457 ns/op

// BeforeActualRun
WorkloadActual   1: 67108864 op, 817654502.00 ns, 12.1840 ns/op
WorkloadActual   2: 67108864 op, 822273561.00 ns, 12.2528 ns/op
WorkloadActual   3: 67108864 op, 811982597.00 ns, 12.0995 ns/op

// AfterActualRun
WorkloadResult   1: 67108864 op, 632024774.00 ns, 9.4179 ns/op
WorkloadResult   2: 67108864 op, 636643833.00 ns, 9.4867 ns/op
WorkloadResult   3: 67108864 op, 626352869.00 ns, 9.3334 ns/op

// AfterAll
// Benchmark Process 78676 has exited with code 0.

Mean = 9.413 ns, StdErr = 0.044 ns (0.47%), N = 3, StdDev = 0.077 ns
Min = 9.333 ns, Q1 = 9.376 ns, Median = 9.418 ns, Q3 = 9.452 ns, Max = 9.487 ns
IQR = 0.077 ns, LowerFence = 9.261 ns, UpperFence = 9.567 ns
ConfidenceInterval = [8.011 ns; 10.814 ns] (CI 99.9%), Margin = 1.401 ns (14.89% of Mean)
Skewness = -0.07, Kurtosis = 0.67, MValue = 2

// ** Remained 0 (0.0%) benchmark(s) to run. Estimated finish 2024-10-11 11:02 (0h 0m from now) **
// ***** BenchmarkRunner: Finish  *****

// * Export *
  BenchmarkDotNet.Artifacts/results/Benchmarks-report.csv
  BenchmarkDotNet.Artifacts/results/Benchmarks-report-github.md
  BenchmarkDotNet.Artifacts/results/Benchmarks-report.html

// * Detailed results *
Benchmarks.GetLastName: ShortRun(IterationCount=3, LaunchCount=1, WarmupCount=3)
Runtime = .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2; GC = Concurrent Workstation
Mean = 9.413 ns, StdErr = 0.044 ns (0.47%), N = 3, StdDev = 0.077 ns
Min = 9.333 ns, Q1 = 9.376 ns, Median = 9.418 ns, Q3 = 9.452 ns, Max = 9.487 ns
IQR = 0.077 ns, LowerFence = 9.261 ns, UpperFence = 9.567 ns
ConfidenceInterval = [8.011 ns; 10.814 ns] (CI 99.9%), Margin = 1.401 ns (14.89% of Mean)
Skewness = -0.07, Kurtosis = 0.67, MValue = 2
-------------------- Histogram --------------------
[9.263 ns ; 9.557 ns) | @@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet v0.14.0, macOS Sequoia 15.0 (24A335) [Darwin 24.0.0]
Intel Core i5-10600 CPU 3.30GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK 8.0.300
  [Host]   : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
  ShortRun : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2

Job=ShortRun  IterationCount=3  LaunchCount=1
WarmupCount=3

| Method      | Mean     | Error    | StdDev    |
|------------ |---------:|---------:|----------:|
| GetLastName | 9.413 ns | 1.401 ns | 0.0768 ns |

// * Legends *
  Mean   : Arithmetic mean of all measurements
  Error  : Half of 99.9% confidence interval
  StdDev : Standard deviation of all measurements
  1 ns   : 1 Nanosecond (0.000000001 sec)

// ***** BenchmarkRunner: End *****
Run time: 00:00:11 (11.88 sec), executed benchmarks: 1

Global total time: 00:00:19 (19.06 sec), executed benchmarks: 1
// * Artifacts cleanup *
Artifacts cleanup is finished

运行上述基准测试, 我们可以得到方法的平均执行时间和内存分配情况.

Span

Span<T> 是在.NET Core 2.1中引入的. 它提供了一种类型安全, 内存安全的方式来表示连续内存区域的可读写视图.

Span<T> 可以引用托管堆, 栈甚至非托管内存.

切片(Slicing)

切片操作可以在常数时间内改变 Span<T> 的视图, 而无需复制内存.

无论操作的内容是什么, 切片的时间始终是 O(1).

         Span<int> span = array.AsSpan();
    +-----------------------------------+
    |         span.slice(3, 3)          |
    |           +-----------+           |
    |           |           |           |
    |           |           |           |
+---+---+---+---+---+---+---+---+---+---+--------+
|   | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |        |
+---+---+---+---+---+---+---+---+---+---+--------+
         int[] array = new int[9];
int[] array = new int[6];
Span<int> span = array.AsSpan();

// 获取从索引3开始的3个元素
Span<int> slice = span.Slice(3, 3);

在.net8以上, 提供了范围运算符: begin...end

static void Demo()
{
    int[] array = new int[9] { 0, 1, 2, 3, 4, 5, 6, 7, 8 };
    Span<int> span = array.AsSpan();

    // 获取从索引3开始的3个元素
    Span<int> slice = span.Slice(3, 3);

    Console.WriteLine("Slice elements:");
    foreach (var item in slice)
    {
        Console.WriteLine(item);
    }

    // 使用范围运算符(C# 8.0 及以上)
    Span<int> sliceWithRange = span[3..6];
    Console.WriteLine("Slice with range:");
    foreach (var item in sliceWithRange)
    {
        Console.WriteLine(item);
    }
}
Demo();
Slice elements:
3
4
5
Slice with range:
3
4
5

使用 Span来做性能优化

假设我们有以下需求: 从一个的数组中返回从中间开始的偶数元素.

原始代码:

public int[] GetMiddleQuarter(int[] array)
{
    return array.Skip(array.Length / 2).Take(array.Length / 4).ToArray();
}

优化后代码:

public Span<int> GetMiddleQuarter(int[] array)
{
    int halfLength = array.Length / 2;
    int quarterLength = array.Length / 4;
    Span<int> span = array.AsSpan();
    return span.Slice(halfLength, quarterLength);
}
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Linq;

namespace BenchmarkExample
{   [MemoryDiagnoser]
    public class MiddleQuarterBenchmark
    {
        private int[] array;

        // 参数化数组大小
        [Params(100, 1000, 10000)]
        public int ArraySize { get; set; }

        [GlobalSetup]
        public void Setup()
        {
            // 初始化数组, 包含从 0 到 ArraySize - 1 的元素
            array = Enumerable.Range(0, ArraySize).ToArray();
        }

        /// <summary>
        /// 原始方法: 使用 LINQ 的 Skip 和 Take
        /// 返回一个新的数组
        /// </summary>
        [Benchmark(Baseline = true)]
        public int[] OriginalMethod()
        {
            return array.Skip(array.Length / 2).Take(array.Length / 4).ToArray();
        }

        /// <summary>
        /// 优化方法: 使用 Span<T> 进行切片
        /// 返回一个新的数组以便与 OriginalMethod 公平比w较
        /// </summary>
        [Benchmark]
        public Span<int> OptimizedMethod()
        {
            int halfLength = array.Length / 2;
            int quarterLength = array.Length / 4;
            Span<int> span = array.AsSpan();
            return span.Slice(halfLength, quarterLength);
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            // 运行基准测试
            var summary = BenchmarkRunner.Run<MiddleQuarterBenchmark>();
        }
    }
}
// Validating benchmarks:
// ***** BenchmarkRunner: Start   *****
// ***** Found 6 benchmark(s) in total *****
// ***** Building 1 exe(s) in Parallel: Start   *****
// start dotnet  restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true /p:IntermediateOutputPath="/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339/obj/Release/net8.0/" /p:OutDir="/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339/bin/Release/net8.0/" /p:OutputPath="/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339/bin/Release/net8.0/" in /private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339
// command took 1.2 sec and exited with 0
// start dotnet  build -c Release --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true /p:IntermediateOutputPath="/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339/obj/Release/net8.0/" /p:OutDir="/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339/bin/Release/net8.0/" /p:OutputPath="/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339/bin/Release/net8.0/" --output "/private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339/bin/Release/net8.0/" in /private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339
// command took 5.59 sec and exited with 0
// ***** Done, took 00:00:06 (6.86 sec)   *****
// Found 6 benchmarks:
//   MiddleQuarterBenchmark.OriginalMethod: DefaultJob [ArraySize=100]
//   MiddleQuarterBenchmark.OptimizedMethod: DefaultJob [ArraySize=100]
//   MiddleQuarterBenchmark.OriginalMethod: DefaultJob [ArraySize=1000]
//   MiddleQuarterBenchmark.OptimizedMethod: DefaultJob [ArraySize=1000]
//   MiddleQuarterBenchmark.OriginalMethod: DefaultJob [ArraySize=10000]
//   MiddleQuarterBenchmark.OptimizedMethod: DefaultJob [ArraySize=10000]

// **************************
// Benchmark: MiddleQuarterBenchmark.OriginalMethod: DefaultJob [ArraySize=100]
// *** Execute ***
// Launch: 1 / 1
w// Execute: dotnet 412cc443-8cc3-49ea-bdc5-a302fa51b339.dll --anonymousPipes 108 109 --benchmarkName "BenchmarkExample.MiddleQuarterBenchmark.OriginalMethod(ArraySize: 100)" --job Default --benchmarkId 0 in /private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339/bin/Release/net8.0
// Failed to set up high priority (Permission denied). In order to run benchmarks with high priority, make sure you have the right permissions.
// BeforeAnythingElse

// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX2,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: DefaultJob

OverheadJitting  1: 1 op, 304800.00 ns, 304.8000 us/op
WorkloadJitting  1: 1 op, 1640361.00 ns, 1.6404 ms/op

OverheadJitting  2: 16 op, 494764.00 ns, 30.9228 us/op
WorkloadJitting  2: 16 op, 579686.00 ns, 36.2304 us/op

WorkloadPilot    1: 16 op, 10176.00 ns, 636.0000 ns/op
WorkloadPilot    2: 32 op, 17254.00 ns, 539.1875 ns/op
WorkloadPilot    3: 64 op, 31244.00 ns, 488.1875 ns/op
WorkloadPilot    4: 128 op, 60192.00 ns, 470.2500 ns/op
WorkloadPilot    5: 256 op, 147441.00 ns, 575.9414 ns/op
WorkloadPilot    6: 512 op, 288526.00 ns, 563.5273 ns/op
WorkloadPilot    7: 1024 op, 554471.00 ns, 541.4756 ns/op
WorkloadPilot    8: 2048 op, 1114172.00 ns, 544.0293 ns/op
WorkloadPilot    9: 4096 op, 2374924.00 ns, 579.8154 ns/op
WorkloadPilot   10: 8192 op, 4279569.00 ns, 522.4083 ns/op
WorkloadPilot   11: 16384 op, 8924662.00 ns, 544.7181 ns/op
WorkloadPilot   12: 32768 op, 16401903.00 ns, 500.5464 ns/op
WorkloadPilot   13: 65536 op, 31213135.00 ns, 476.2746 ns/op
WorkloadPilot   14: 131072 op, 61880461.00 ns, 472.1105 ns/op
WorkloadPilot   15: 262144 op, 72239579.00 ns, 275.5721 ns/op
WorkloadPilot   16: 524288 op, 61847892.00 ns, 117.9655 ns/op
WorkloadPilot   17: 1048576 op, 127696876.00 ns, 121.7812 ns/op
WorkloadPilot   18: 2097152 op, 264798613.00 ns, 126.2658 ns/op
WorkloadPilot   19: 4194304 op, 498318364.00 ns, 118.8084 ns/op
WorkloadPilot   20: 8388608 op, 1001546406.00 ns, 119.3936 ns/op

OverheadWarmup   1: 8388608 op, 29402715.00 ns, 3.5051 ns/op
OverheadWarmup   2: 8388608 op, 28270325.00 ns, 3.3701 ns/op
OverheadWarmup   3: 8388608 op, 28487263.00 ns, 3.3959 ns/op
OverheadWarmup   4: 8388608 op, 27691078.00 ns, 3.3010 ns/op
OverheadWarmup   5: 8388608 op, 27502591.00 ns, 3.2786 ns/op
OverheadWarmup   6: 8388608 op, 27499973.00 ns, 3.2783 ns/op
OverheadWarmup   7: 8388608 op, 27417843.00 ns, 3.2685 ns/op
OverheadWarmup   8: 8388608 op, 27300204.00 ns, 3.2544 ns/op
OverheadWarmup   9: 8388608 op, 27348271.00 ns, 3.2602 ns/op
OverheadWarmup  10: 8388608 op, 27533117.00 ns, 3.2822 ns/op

OverheadActual   1: 8388608 op, 27553267.00 ns, 3.2846 ns/op
OverheadActual   2: 8388608 op, 27438036.00 ns, 3.2709 ns/op
OverheadActual   3: 8388608 op, 27369445.00 ns, 3.2627 ns/op
OverheadActual   4: 8388608 op, 27341672.00 ns, 3.2594 ns/op
OverheadActual   5: 8388608 op, 27630359.00 ns, 3.2938 ns/op
OverheadActual   6: 8388608 op, 27511237.00 ns, 3.2796 ns/op
OverheadActual   7: 8388608 op, 27429850.00 ns, 3.2699 ns/op
OverheadActual   8: 8388608 op, 27490977.00 ns, 3.2772 ns/op
OverheadActual   9: 8388608 op, 27415328.00 ns, 3.2682 ns/op
OverheadActual  10: 8388608 op, 27675927.00 ns, 3.2992 ns/op
OverheadActual  11: 8388608 op, 27242818.00 ns, 3.2476 ns/op
OverheadActual  12: 8388608 op, 23563335.00 ns, 2.8090 ns/op
OverheadActual  13: 8388608 op, 23385949.00 ns, 2.7878 ns/op
OverheadActual  14: 8388608 op, 23535333.00 ns, 2.8056 ns/op
OverheadActual  15: 8388608 op, 23309247.00 ns, 2.7787 ns/op
OverheadActual  16: 8388608 op, 23463720.00 ns, 2.7971 ns/op
OverheadActual  17: 8388608 op, 23408664.00 ns, 2.7905 ns/op
OverheadActual  18: 8388608 op, 23351912.00 ns, 2.7838 ns/op
OverheadActual  19: 8388608 op, 23857733.00 ns, 2.8441 ns/op
OverheadActual  20: 8388608 op, 23262426.00 ns, 2.7731 ns/op

WorkloadWarmup   1: 8388608 op, 1048209766.00 ns, 124.9563 ns/op
WorkloadWarmup   2: 8388608 op, 1015438338.00 ns, 121.0497 ns/op
WorkloadWarmup   3: 8388608 op, 983877461.00 ns, 117.2873 ns/op
WorkloadWarmup   4: 8388608 op, 1004920615.00 ns, 119.7959 ns/op
WorkloadWarmup   5: 8388608 op, 1032645120.00 ns, 123.1009 ns/op
WorkloadWarmup   6: 8388608 op, 984280192.00 ns, 117.3353 ns/op
WorkloadWarmup   7: 8388608 op, 1009160992.00 ns, 120.3014 ns/op
WorkloadWarmup   8: 8388608 op, 981940329.00 ns, 117.0564 ns/op

// BeforeActualRun
WorkloadActual   1: 8388608 op, 1051697592.00 ns, 125.3721 ns/op
WorkloadActual   2: 8388608 op, 1119620133.00 ns, 133.4691 ns/op
WorkloadActual   3: 8388608 op, 1073568237.00 ns, 127.9793 ns/op
WorkloadActual   4: 8388608 op, 1073480347.00 ns, 127.9688 ns/op
WorkloadActual   5: 8388608 op, 977515533.00 ns, 116.5289 ns/op
WorkloadActual   6: 8388608 op, 1000717174.00 ns, 119.2948 ns/op
WorkloadActual   7: 8388608 op, 1058672132.00 ns, 126.2036 ns/op
WorkloadActual   8: 8388608 op, 1082677859.00 ns, 129.0653 ns/op
WorkloadActual   9: 8388608 op, 1170109912.00 ns, 139.4880 ns/op
WorkloadActual  10: 8388608 op, 1104555856.00 ns, 131.6733 ns/op
WorkloadActual  11: 8388608 op, 1087837524.00 ns, 129.6803 ns/op
WorkloadActual  12: 8388608 op, 1120274549.00 ns, 133.5471 ns/op
WorkloadActual  13: 8388608 op, 1069061603.00 ns, 127.4421 ns/op
WorkloadActual  14: 8388608 op, 1100260714.00 ns, 131.1613 ns/op
WorkloadActual  15: 8388608 op, 1087919094.00 ns, 129.6901 ns/op
WorkloadActual  16: 8388608 op, 1170010989.00 ns, 139.4762 ns/op
WorkloadActual  17: 8388608 op, 1134234924.00 ns, 135.2113 ns/op
WorkloadActual  18: 8388608 op, 1117153364.00 ns, 133.1751 ns/op
WorkloadActual  19: 8388608 op, 1151329119.00 ns, 137.2491 ns/op
WorkloadActual  20: 8388608 op, 1063427181.00 ns, 126.7704 ns/op
WorkloadActual  21: 8388608 op, 1092160819.00 ns, 130.1957 ns/op
WorkloadActual  22: 8388608 op, 1096832182.00 ns, 130.7526 ns/op
WorkloadActual  23: 8388608 op, 1045546266.00 ns, 124.6388 ns/op
WorkloadActual  24: 8388608 op, 1044835225.00 ns, 124.5541 ns/op
WorkloadActual  25: 8388608 op, 1059829107.00 ns, 126.3415 ns/op
WorkloadActual  26: 8388608 op, 1036974183.00 ns, 123.6170 ns/op
WorkloadActual  27: 8388608 op, 1034077362.00 ns, 123.2716 ns/op
WorkloadActual  28: 8388608 op, 956702723.00 ns, 114.0479 ns/op
WorkloadActual  29: 8388608 op, 994351238.00 ns, 118.5359 ns/op
WorkloadActual  30: 8388608 op, 998818560.00 ns, 119.0685 ns/op
WorkloadActual  31: 8388608 op, 959245693.00 ns, 114.3510 ns/op
WorkloadActual  32: 8388608 op, 1015857358.00 ns, 121.0996 ns/op
WorkloadActual  33: 8388608 op, 1010297458.00 ns, 120.4368 ns/op
WorkloadActual  34: 8388608 op, 964179318.00 ns, 114.9391 ns/op
WorkloadActual  35: 8388608 op, 968869307.00 ns, 115.4982 ns/op
WorkloadActual  36: 8388608 op, 1047734241.00 ns, 124.8997 ns/op
WorkloadActual  37: 8388608 op, 1051675432.00 ns, 125.3695 ns/op
WorkloadActual  38: 8388608 op, 980677492.00 ns, 116.9059 ns/op
WorkloadActual  39: 8388608 op, 1005882696.00 ns, 119.9106 ns/op
WorkloadActual  40: 8388608 op, 1048394039.00 ns, 124.9783 ns/op
WorkloadActual  41: 8388608 op, 1048966579.00 ns, 125.0466 ns/op
WorkloadActual  42: 8388608 op, 1014428149.00 ns, 120.9293 ns/op
WorkloadActual  43: 8388608 op, 1038185670.00 ns, 123.7614 ns/op
WorkloadActual  44: 8388608 op, 1033999630.00 ns, 123.2624 ns/op
WorkloadActual  45: 8388608 op, 1063570920.00 ns, 126.7875 ns/op
WorkloadActual  46: 8388608 op, 1067438080.00 ns, 127.2485 ns/op
WorkloadActual  47: 8388608 op, 1063690410.00 ns, 126.8018 ns/op
WorkloadActual  48: 8388608 op, 1010105102.00 ns, 120.4139 ns/op
WorkloadActual  49: 8388608 op, 982375287.00 ns, 117.1083 ns/op
WorkloadActual  50: 8388608 op, 1013717639.00 ns, 120.8446 ns/op
WorkloadActual  51: 8388608 op, 984520027.00 ns, 117.3639 ns/op
WorkloadActual  52: 8388608 op, 967906336.00 ns, 115.3834 ns/op
WorkloadActual  53: 8388608 op, 994794791.00 ns, 118.5888 ns/op
WorkloadActual  54: 8388608 op, 956886801.00 ns, 114.0698 ns/op
WorkloadActual  55: 8388608 op, 1002856022.00 ns, 119.5498 ns/op
WorkloadActual  56: 8388608 op, 998536951.00 ns, 119.0349 ns/op
WorkloadActual  57: 8388608 op, 996316395.00 ns, 118.7702 ns/op
WorkloadActual  58: 8388608 op, 959940843.00 ns, 114.4339 ns/op
WorkloadActual  59: 8388608 op, 954016090.00 ns, 113.7276 ns/op
WorkloadActual  60: 8388608 op, 960932495.00 ns, 114.5521 ns/op
WorkloadActual  61: 8388608 op, 962240203.00 ns, 114.7080 ns/op
WorkloadActual  62: 8388608 op, 953886133.00 ns, 113.7121 ns/op
WorkloadActual  63: 8388608 op, 958313275.00 ns, 114.2398 ns/op
WorkloadActual  64: 8388608 op, 977197844.00 ns, 116.4911 ns/op
WorkloadActual  65: 8388608 op, 950080371.00 ns, 113.2584 ns/op
WorkloadActual  66: 8388608 op, 945417460.00 ns, 112.7025 ns/op
WorkloadActual  67: 8388608 op, 1018131659.00 ns, 121.3708 ns/op
WorkloadActual  68: 8388608 op, 984233073.00 ns, 117.3297 ns/op
WorkloadActual  69: 8388608 op, 965130839.00 ns, 115.0526 ns/op
WorkloadActual  70: 8388608 op, 979131485.00 ns, 116.7216 ns/op
WorkloadActual  71: 8388608 op, 995976073.00 ns, 118.7296 ns/op
WorkloadActual  72: 8388608 op, 977302002.00 ns, 116.5035 ns/op
WorkloadActual  73: 8388608 op, 996530213.00 ns, 118.7957 ns/op
WorkloadActual  74: 8388608 op, 1033406994.00 ns, 123.1917 ns/op
WorkloadActual  75: 8388608 op, 1002174527.00 ns, 119.4685 ns/op
WorkloadActual  76: 8388608 op, 956923484.00 ns, 114.0742 ns/op
WorkloadActual  77: 8388608 op, 1006244371.00 ns, 119.9537 ns/op
WorkloadActual  78: 8388608 op, 954344947.00 ns, 113.7668 ns/op
WorkloadActual  79: 8388608 op, 989607059.00 ns, 117.9704 ns/op
WorkloadActual  80: 8388608 op, 982231838.00 ns, 117.0912 ns/op
WorkloadActual  81: 8388608 op, 1013748519.00 ns, 120.8482 ns/op
WorkloadActual  82: 8388608 op, 1005874476.00 ns, 119.9096 ns/op
WorkloadActual  83: 8388608 op, 965165996.00 ns, 115.0568 ns/op
WorkloadActual  84: 8388608 op, 970228872.00 ns, 115.6603 ns/op
WorkloadActual  85: 8388608 op, 1008728313.00 ns, 120.2498 ns/op
WorkloadActual  86: 8388608 op, 972683445.00 ns, 115.9529 ns/op

// AfterActualRun
WorkloadResult   1: 8388608 op, 1024405347.00 ns, 122.1186 ns/op
WorkloadResult   2: 8388608 op, 1092327888.00 ns, 130.2156 ns/op
WorkloadResult   3: 8388608 op, 1046275992.00 ns, 124.7258 ns/op
WorkloadResult   4: 8388608 op, 1046188102.00 ns, 124.7153 ns/op
WorkloadResult   5: 8388608 op, 950223288.00 ns, 113.2754 ns/op
WorkloadResult   6: 8388608 op, 973424929.00 ns, 116.0413 ns/op
WorkloadResult   7: 8388608 op, 1031379887.00 ns, 122.9501 ns/op
WorkloadResult   8: 8388608 op, 1055385614.00 ns, 125.8118 ns/op
WorkloadResult   9: 8388608 op, 1142817667.00 ns, 136.2345 ns/op
WorkloadResult  10: 8388608 op, 1077263611.00 ns, 128.4198 ns/op
WorkloadResult  11: 8388608 op, 1060545279.00 ns, 126.4268 ns/op
WorkloadResult  12: 8388608 op, 1092982304.00 ns, 130.2936 ns/op
WorkloadResult  13: 8388608 op, 1041769358.00 ns, 124.1886 ns/op
WorkloadResult  14: 8388608 op, 1072968469.00 ns, 127.9078 ns/op
WorkloadResult  15: 8388608 op, 1060626849.00 ns, 126.4366 ns/op
WorkloadResult  16: 8388608 op, 1142718744.00 ns, 136.2227 ns/op
WorkloadResult  17: 8388608 op, 1106942679.00 ns, 131.9579 ns/op
WorkloadResult  18: 8388608 op, 1089861119.00 ns, 129.9216 ns/op
WorkloadResult  19: 8388608 op, 1124036874.00 ns, 133.9956 ns/op
WorkloadResult  20: 8388608 op, 1036134936.00 ns, 123.5169 ns/op
WorkloadResult  21: 8388608 op, 1064868574.00 ns, 126.9422 ns/op
WorkloadResult  22: 8388608 op, 1069539937.00 ns, 127.4991 ns/op
WorkloadResult  23: 8388608 op, 1018254021.00 ns, 121.3853 ns/op
WorkloadResult  24: 8388608 op, 1017542980.00 ns, 121.3006 ns/op
WorkloadResult  25: 8388608 op, 1032536862.00 ns, 123.0880 ns/op
WorkloadResult  26: 8388608 op, 1009681938.00 ns, 120.3635 ns/op
WorkloadResult  27: 8388608 op, 1006785117.00 ns, 120.0181 ns/op
WorkloadResult  28: 8388608 op, 929410478.00 ns, 110.7944 ns/op
WorkloadResult  29: 8388608 op, 967058993.00 ns, 115.2824 ns/op
WorkloadResult  30: 8388608 op, 971526315.00 ns, 115.8150 ns/op
WorkloadResult  31: 8388608 op, 931953448.00 ns, 111.0975 ns/op
WorkloadResult  32: 8388608 op, 988565113.00 ns, 117.8461 ns/op
WorkloadResult  33: 8388608 op, 983005213.00 ns, 117.1834 ns/op
WorkloadResult  34: 8388608 op, 936887073.00 ns, 111.6856 ns/op
WorkloadResult  35: 8388608 op, 941577062.00 ns, 112.2447 ns/op
WorkloadResult  36: 8388608 op, 1020441996.00 ns, 121.6462 ns/op
WorkloadResult  37: 8388608 op, 1024383187.00 ns, 122.1160 ns/op
WorkloadResult  38: 8388608 op, 953385247.00 ns, 113.6524 ns/op
WorkloadResult  39: 8388608 op, 978590451.00 ns, 116.6571 ns/op
WorkloadResult  40: 8388608 op, 1021101794.00 ns, 121.7248 ns/op
WorkloadResult  41: 8388608 op, 1021674334.00 ns, 121.7931 ns/op
WorkloadResult  42: 8388608 op, 987135904.00 ns, 117.6758 ns/op
WorkloadResult  43: 8388608 op, 1010893425.00 ns, 120.5079 ns/op
WorkloadResult  44: 8388608 op, 1006707385.00 ns, 120.0089 ns/op
WorkloadResult  45: 8388608 op, 1036278675.00 ns, 123.5340 ns/op
WorkloadResult  46: 8388608 op, 1040145835.00 ns, 123.9950 ns/op
WorkloadResult  47: 8388608 op, 1036398165.00 ns, 123.5483 ns/op
WorkloadResult  48: 8388608 op, 982812857.00 ns, 117.1604 ns/op
WorkloadResult  49: 8388608 op, 955083042.00 ns, 113.8548 ns/op
WorkloadResult  50: 8388608 op, 986425394.00 ns, 117.5911 ns/op
WorkloadResult  51: 8388608 op, 957227782.00 ns, 114.1104 ns/op
WorkloadResult  52: 8388608 op, 940614091.00 ns, 112.1299 ns/op
WorkloadResult  53: 8388608 op, 967502546.00 ns, 115.3353 ns/op
WorkloadResult  54: 8388608 op, 929594556.00 ns, 110.8163 ns/op
WorkloadResult  55: 8388608 op, 975563777.00 ns, 116.2963 ns/op
WorkloadResult  56: 8388608 op, 971244706.00 ns, 115.7814 ns/op
WorkloadResult  57: 8388608 op, 969024150.00 ns, 115.5167 ns/op
WorkloadResult  58: 8388608 op, 932648598.00 ns, 111.1804 ns/op
WorkloadResult  59: 8388608 op, 926723845.00 ns, 110.4741 ns/op
WorkloadResult  60: 8388608 op, 933640250.00 ns, 111.2986 ns/op
WorkloadResult  61: 8388608 op, 934947958.00 ns, 111.4545 ns/op
WorkloadResult  62: 8388608 op, 926593888.00 ns, 110.4586 ns/op
WorkloadResult  63: 8388608 op, 931021030.00 ns, 110.9864 ns/op
WorkloadResult  64: 8388608 op, 949905599.00 ns, 113.2376 ns/op
WorkloadResult  65: 8388608 op, 922788126.00 ns, 110.0049 ns/op
WorkloadResult  66: 8388608 op, 918125215.00 ns, 109.4491 ns/op
WorkloadResult  67: 8388608 op, 990839414.00 ns, 118.1173 ns/op
WorkloadResult  68: 8388608 op, 956940828.00 ns, 114.0762 ns/op
WorkloadResult  69: 8388608 op, 937838594.00 ns, 111.7991 ns/op
WorkloadResult  70: 8388608 op, 951839240.00 ns, 113.4681 ns/op
WorkloadResult  71: 8388608 op, 968683828.00 ns, 115.4761 ns/op
WorkloadResult  72: 8388608 op, 950009757.00 ns, 113.2500 ns/op
WorkloadResult  73: 8388608 op, 969237968.00 ns, 115.5422 ns/op
WorkloadResult  74: 8388608 op, 1006114749.00 ns, 119.9382 ns/op
WorkloadResult  75: 8388608 op, 974882282.00 ns, 116.2150 ns/op
WorkloadResult  76: 8388608 op, 929631239.00 ns, 110.8207 ns/op
WorkloadResult  77: 8388608 op, 978952126.00 ns, 116.7002 ns/op
WorkloadResult  78: 8388608 op, 927052702.00 ns, 110.5133 ns/op
WorkloadResult  79: 8388608 op, 962314814.00 ns, 114.7169 ns/op
WorkloadResult  80: 8388608 op, 954939593.00 ns, 113.8377 ns/op
WorkloadResult  81: 8388608 op, 986456274.00 ns, 117.5948 ns/op
WorkloadResult  82: 8388608 op, 978582231.00 ns, 116.6561 ns/op
WorkloadResult  83: 8388608 op, 937873751.00 ns, 111.8033 ns/op
WorkloadResult  84: 8388608 op, 942936627.00 ns, 112.4068 ns/op
WorkloadResult  85: 8388608 op, 981436068.00 ns, 116.9963 ns/op
WorkloadResult  86: 8388608 op, 945391200.00 ns, 112.6994 ns/op
// GC:  299 0 0 1879048928 8388608
// Threading:  0 0 8388608

// AfterAll
// Benchmark Process 32258 has exited with code 0.

Mean = 118.541 ns, StdErr = 0.713 ns (0.60%), N = 86, StdDev = 6.617 ns
Min = 109.449 ns, Q1 = 113.256 ns, Median = 116.848 ns, Q3 = 123.054 ns, Max = 136.234 ns
IQR = 9.797 ns, LowerFence = 98.561 ns, UpperFence = 137.749 ns
ConfidenceInterval = [116.109 ns; 120.973 ns] (CI 99.9%), Margin = 2.432 ns (2.05% of Mean)
Skewness = 0.74, Kurtosis = 2.76, MValue = 2

// ** Remained 5 (83.3%) benchmark(s) to run. Estimated finish 2024-10-10 17:01 (0h 8m from now) **
// **************************
// Benchmark: MiddleQuarterBenchmark.OptimizedMethod: DefaultJob [ArraySize=100]
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet 412cc443-8cc3-49ea-bdc5-a302fa51b339.dll --anonymousPipes 110 111 --benchmarkName "BenchmarkExample.MiddleQuarterBenchmark.OptimizedMethod(ArraySize: 100)" --job Default --benchmarkId 1 in /private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339/bin/Release/net8.0
// Failed to set up high priority (Permission denied). In order to run benchmarks with high priority, make sure you have the right permissions.
// BeforeAnythingElse

// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX2,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: DefaultJob

OverheadJitting  1: 1 op, 280915.00 ns, 280.9150 us/op
WorkloadJitting  1: 1 op, 285723.00 ns, 285.7230 us/op

OverheadJitting  2: 16 op, 208690.00 ns, 13.0431 us/op
WorkloadJitting  2: 16 op, 236681.00 ns, 14.7926 us/op

WorkloadPilot    1: 16 op, 2263.00 ns, 141.4375 ns/op
WorkloadPilot    2: 32 op, 1363.00 ns, 42.5938 ns/op
WorkloadPilot    3: 64 op, 1333.00 ns, 20.8281 ns/op
WorkloadPilot    4: 128 op, 1601.00 ns, 12.5078 ns/op
WorkloadPilot    5: 256 op, 2743.00 ns, 10.7148 ns/op
WorkloadPilot    6: 512 op, 5126.00 ns, 10.0117 ns/op
WorkloadPilot    7: 1024 op, 9763.00 ns, 9.5342 ns/op
WorkloadPilot    8: 2048 op, 18972.00 ns, 9.2637 ns/op
WorkloadPilot    9: 4096 op, 37180.00 ns, 9.0771 ns/op
WorkloadPilot   10: 8192 op, 74176.00 ns, 9.0547 ns/op
WorkloadPilot   11: 16384 op, 152263.00 ns, 9.2934 ns/op
WorkloadPilot   12: 32768 op, 294688.00 ns, 8.9932 ns/op
WorkloadPilot   13: 65536 op, 587718.00 ns, 8.9679 ns/op
WorkloadPilot   14: 131072 op, 1175017.00 ns, 8.9647 ns/op
WorkloadPilot   15: 262144 op, 2362115.00 ns, 9.0108 ns/op
WorkloadPilot   16: 524288 op, 5130219.00 ns, 9.7851 ns/op
WorkloadPilot   17: 1048576 op, 10069454.00 ns, 9.6030 ns/op
WorkloadPilot   18: 2097152 op, 19198259.00 ns, 9.1544 ns/op
WorkloadPilot   19: 4194304 op, 38432858.00 ns, 9.1631 ns/op
WorkloadPilot   20: 8388608 op, 75599981.00 ns, 9.0122 ns/op
WorkloadPilot   21: 16777216 op, 36606936.00 ns, 2.1819 ns/op
WorkloadPilot   22: 33554432 op, 57051711.00 ns, 1.7003 ns/op
WorkloadPilot   23: 67108864 op, 112527581.00 ns, 1.6768 ns/op
WorkloadPilot   24: 134217728 op, 226346655.00 ns, 1.6864 ns/op
WorkloadPilot   25: 268435456 op, 447665993.00 ns, 1.6677 ns/op
WorkloadPilot   26: 536870912 op, 896822169.00 ns, 1.6705 ns/op

OverheadWarmup   1: 536870912 op, 837405705.00 ns, 1.5598 ns/op
OverheadWarmup   2: 536870912 op, 789422707.00 ns, 1.4704 ns/op
OverheadWarmup   3: 536870912 op, 793527731.00 ns, 1.4781 ns/op
OverheadWarmup   4: 536870912 op, 789960843.00 ns, 1.4714 ns/op
OverheadWarmup   5: 536870912 op, 784964805.00 ns, 1.4621 ns/op
OverheadWarmup   6: 536870912 op, 787009091.00 ns, 1.4659 ns/op
OverheadWarmup   7: 536870912 op, 785935540.00 ns, 1.4639 ns/op

OverheadActual   1: 536870912 op, 786235910.00 ns, 1.4645 ns/op
OverheadActual   2: 536870912 op, 796489257.00 ns, 1.4836 ns/op
OverheadActual   3: 536870912 op, 787002730.00 ns, 1.4659 ns/op
OverheadActual   4: 536870912 op, 787385797.00 ns, 1.4666 ns/op
OverheadActual   5: 536870912 op, 792095152.00 ns, 1.4754 ns/op
OverheadActual   6: 536870912 op, 807101314.00 ns, 1.5033 ns/op
OverheadActual   7: 536870912 op, 790218089.00 ns, 1.4719 ns/op
OverheadActual   8: 536870912 op, 790113715.00 ns, 1.4717 ns/op
OverheadActual   9: 536870912 op, 790310494.00 ns, 1.4721 ns/op
OverheadActual  10: 536870912 op, 788015815.00 ns, 1.4678 ns/op
OverheadActual  11: 536870912 op, 786408228.00 ns, 1.4648 ns/op
OverheadActual  12: 536870912 op, 783143474.00 ns, 1.4587 ns/op
OverheadActual  13: 536870912 op, 785581346.00 ns, 1.4633 ns/op
OverheadActual  14: 536870912 op, 788982626.00 ns, 1.4696 ns/op
OverheadActual  15: 536870912 op, 787243409.00 ns, 1.4664 ns/op

WorkloadWarmup   1: 536870912 op, 899032602.00 ns, 1.6746 ns/op
WorkloadWarmup   2: 536870912 op, 895469803.00 ns, 1.6679 ns/op
WorkloadWarmup   3: 536870912 op, 903113129.00 ns, 1.6822 ns/op
WorkloadWarmup   4: 536870912 op, 891241622.00 ns, 1.6601 ns/op
WorkloadWarmup   5: 536870912 op, 900123447.00 ns, 1.6766 ns/op
WorkloadWarmup   6: 536870912 op, 897200708.00 ns, 1.6712 ns/op

// BeforeActualRun
WorkloadActual   1: 536870912 op, 900463033.00 ns, 1.6772 ns/op
WorkloadActual   2: 536870912 op, 936762428.00 ns, 1.7449 ns/op
WorkloadActual   3: 536870912 op, 1001672836.00 ns, 1.8658 ns/op
WorkloadActual   4: 536870912 op, 892195883.00 ns, 1.6618 ns/op
WorkloadActual   5: 536870912 op, 897771890.00 ns, 1.6722 ns/op
WorkloadActual   6: 536870912 op, 897885657.00 ns, 1.6724 ns/op
WorkloadActual   7: 536870912 op, 883780461.00 ns, 1.6462 ns/op
WorkloadActual   8: 536870912 op, 886607660.00 ns, 1.6514 ns/op
WorkloadActual   9: 536870912 op, 882740241.00 ns, 1.6442 ns/op
WorkloadActual  10: 536870912 op, 879719526.00 ns, 1.6386 ns/op
WorkloadActual  11: 536870912 op, 886648334.00 ns, 1.6515 ns/op
WorkloadActual  12: 536870912 op, 879803702.00 ns, 1.6388 ns/op
WorkloadActual  13: 536870912 op, 890590371.00 ns, 1.6589 ns/op
WorkloadActual  14: 536870912 op, 901609126.00 ns, 1.6794 ns/op
WorkloadActual  15: 536870912 op, 906735532.00 ns, 1.6889 ns/op

// AfterActualRun
WorkloadResult   1: 536870912 op, 112447218.00 ns, 0.2094 ns/op
WorkloadResult   2: 536870912 op, 104180068.00 ns, 0.1941 ns/op
WorkloadResult   3: 536870912 op, 109756075.00 ns, 0.2044 ns/op
WorkloadResult   4: 536870912 op, 109869842.00 ns, 0.2046 ns/op
WorkloadResult   5: 536870912 op, 95764646.00 ns, 0.1784 ns/op
WorkloadResult   6: 536870912 op, 98591845.00 ns, 0.1836 ns/op
WorkloadResult   7: 536870912 op, 94724426.00 ns, 0.1764 ns/op
WorkloadResult   8: 536870912 op, 91703711.00 ns, 0.1708 ns/op
WorkloadResult   9: 536870912 op, 98632519.00 ns, 0.1837 ns/op
WorkloadResult  10: 536870912 op, 91787887.00 ns, 0.1710 ns/op
WorkloadResult  11: 536870912 op, 102574556.00 ns, 0.1911 ns/op
WorkloadResult  12: 536870912 op, 113593311.00 ns, 0.2116 ns/op
WorkloadResult  13: 536870912 op, 118719717.00 ns, 0.2211 ns/op
// GC:  0 0 0 736 536870912
// Threading:  0 0 536870912

// AfterAll
// Benchmark Process 32325 has exited with code 0.

Mean = 0.192 ns, StdErr = 0.005 ns (2.40%), N = 13, StdDev = 0.017 ns
Min = 0.171 ns, Q1 = 0.178 ns, Median = 0.191 ns, Q3 = 0.205 ns, Max = 0.221 ns
IQR = 0.026 ns, LowerFence = 0.139 ns, UpperFence = 0.244 ns
ConfidenceInterval = [0.172 ns; 0.212 ns] (CI 99.9%), Margin = 0.020 ns (10.35% of Mean)
Skewness = 0.2, Kurtosis = 1.51, MValue = 3

// ** Remained 4 (66.7%) benchmark(s) to run. Estimated finish 2024-10-10 16:58 (0h 4m from now) **
// **************************
// Benchmark: MiddleQuarterBenchmark.OriginalMethod: DefaultJob [ArraySize=1000]
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet 412cc443-8cc3-49ea-bdc5-a302fa51b339.dll --anonymousPipes 110 111 --benchmarkName "BenchmarkExample.MiddleQuarterBenchmark.OriginalMethod(ArraySize: 1000)" --job Default --benchmarkId 2 in /private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339/bin/Release/net8.0
// Failed to set up high priority (Permission denied). In order to run benchmarks with high priority, make sure you have the right permissions.
// BeforeAnythingElse

// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX2,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: DefaultJob

OverheadJitting  1: 1 op, 296553.00 ns, 296.5530 us/op
WorkloadJitting  1: 1 op, 1759326.00 ns, 1.7593 ms/op

OverheadJitting  2: 16 op, 512501.00 ns, 32.0313 us/op
WorkloadJitting  2: 16 op, 696407.00 ns, 43.5254 us/op

WorkloadPilot    1: 16 op, 66148.00 ns, 4.1343 us/op
WorkloadPilot    2: 32 op, 172483.00 ns, 5.3901 us/op
WorkloadPilot    3: 64 op, 340395.00 ns, 5.3187 us/op
WorkloadPilot    4: 128 op, 580515.00 ns, 4.5353 us/op
WorkloadPilot    5: 256 op, 1089961.00 ns, 4.2577 us/op
WorkloadPilot    6: 512 op, 2136026.00 ns, 4.1719 us/op
WorkloadPilot    7: 1024 op, 4702248.00 ns, 4.5920 us/op
WorkloadPilot    8: 2048 op, 9011485.00 ns, 4.4001 us/op
WorkloadPilot    9: 4096 op, 18495646.00 ns, 4.5155 us/op
WorkloadPilot   10: 8192 op, 34965559.00 ns, 4.2683 us/op
WorkloadPilot   11: 16384 op, 64501775.00 ns, 3.9369 us/op
WorkloadPilot   12: 32768 op, 110958553.00 ns, 3.3862 us/op
WorkloadPilot   13: 65536 op, 50387901.00 ns, 768.8584 ns/op
WorkloadPilot   14: 131072 op, 93699109.00 ns, 714.8675 ns/op
WorkloadPilot   15: 262144 op, 190171016.00 ns, 725.4449 ns/op
WorkloadPilot   16: 524288 op, 384490986.00 ns, 733.3584 ns/op
WorkloadPilot   17: 1048576 op, 803259605.00 ns, 766.0481 ns/op

OverheadWarmup   1: 1048576 op, 3443055.00 ns, 3.2836 ns/op
OverheadWarmup   2: 1048576 op, 3429903.00 ns, 3.2710 ns/op
OverheadWarmup   3: 1048576 op, 3411035.00 ns, 3.2530 ns/op
OverheadWarmup   4: 1048576 op, 3427330.00 ns, 3.2686 ns/op
OverheadWarmup   5: 1048576 op, 3473242.00 ns, 3.3123 ns/op
OverheadWarmup   6: 1048576 op, 3405164.00 ns, 3.2474 ns/op
OverheadWarmup   7: 1048576 op, 3476322.00 ns, 3.3153 ns/op
OverheadWarmup   8: 1048576 op, 3397831.00 ns, 3.2404 ns/op

OverheadActual   1: 1048576 op, 3397906.00 ns, 3.2405 ns/op
OverheadActual   2: 1048576 op, 3430442.00 ns, 3.2715 ns/op
OverheadActual   3: 1048576 op, 3402332.00 ns, 3.2447 ns/op
OverheadActual   4: 1048576 op, 3397180.00 ns, 3.2398 ns/op
OverheadActual   5: 1048576 op, 3652380.00 ns, 3.4832 ns/op
OverheadActual   6: 1048576 op, 3469268.00 ns, 3.3086 ns/op
OverheadActual   7: 1048576 op, 3437421.00 ns, 3.2782 ns/op
OverheadActual   8: 1048576 op, 3399039.00 ns, 3.2416 ns/op
OverheadActual   9: 1048576 op, 3427735.00 ns, 3.2689 ns/op
OverheadActual  10: 1048576 op, 3516416.00 ns, 3.3535 ns/op
OverheadActual  11: 1048576 op, 3583819.00 ns, 3.4178 ns/op
OverheadActual  12: 1048576 op, 3621183.00 ns, 3.4534 ns/op
OverheadActual  13: 1048576 op, 3652566.00 ns, 3.4834 ns/op
OverheadActual  14: 1048576 op, 3473421.00 ns, 3.3125 ns/op
OverheadActual  15: 1048576 op, 4419093.00 ns, 4.2144 ns/op

WorkloadWarmup   1: 1048576 op, 759553258.00 ns, 724.3664 ns/op
WorkloadWarmup   2: 1048576 op, 762955356.00 ns, 727.6109 ns/op
WorkloadWarmup   3: 1048576 op, 798820111.00 ns, 761.8142 ns/op
WorkloadWarmup   4: 1048576 op, 752043594.00 ns, 717.2047 ns/op
WorkloadWarmup   5: 1048576 op, 831088188.00 ns, 792.5875 ns/op
WorkloadWarmup   6: 1048576 op, 747510103.00 ns, 712.8812 ns/op

// BeforeActualRun
WorkloadActual   1: 1048576 op, 812095451.00 ns, 774.4746 ns/op
WorkloadActual   2: 1048576 op, 759285755.00 ns, 724.1113 ns/op
WorkloadActual   3: 1048576 op, 847777072.00 ns, 808.5032 ns/op
WorkloadActual   4: 1048576 op, 879659540.00 ns, 838.9087 ns/op
WorkloadActual   5: 1048576 op, 910298780.00 ns, 868.1286 ns/op
WorkloadActual   6: 1048576 op, 899762674.00 ns, 858.0806 ns/op
WorkloadActual   7: 1048576 op, 897200026.00 ns, 855.6366 ns/op
WorkloadActual   8: 1048576 op, 888633404.00 ns, 847.4669 ns/op
WorkloadActual   9: 1048576 op, 847571708.00 ns, 808.3074 ns/op
WorkloadActual  10: 1048576 op, 831841753.00 ns, 793.3061 ns/op
WorkloadActual  11: 1048576 op, 808934333.00 ns, 771.4599 ns/op
WorkloadActual  12: 1048576 op, 828561604.00 ns, 790.1779 ns/op
WorkloadActual  13: 1048576 op, 805660408.00 ns, 768.3376 ns/op
WorkloadActual  14: 1048576 op, 840488411.00 ns, 801.5522 ns/op
WorkloadActual  15: 1048576 op, 849711400.00 ns, 810.3479 ns/op
WorkloadActual  16: 1048576 op, 877949824.00 ns, 837.2782 ns/op
WorkloadActual  17: 1048576 op, 868023317.00 ns, 827.8115 ns/op
WorkloadActual  18: 1048576 op, 847276114.00 ns, 808.0255 ns/op
WorkloadActual  19: 1048576 op, 822506326.00 ns, 784.4032 ns/op
WorkloadActual  20: 1048576 op, 853881716.00 ns, 814.3251 ns/op
WorkloadActual  21: 1048576 op, 845201205.00 ns, 806.0467 ns/op
WorkloadActual  22: 1048576 op, 858252365.00 ns, 818.4932 ns/op
WorkloadActual  23: 1048576 op, 924193263.00 ns, 881.3794 ns/op
WorkloadActual  24: 1048576 op, 901366244.00 ns, 859.6098 ns/op
WorkloadActual  25: 1048576 op, 968170906.00 ns, 923.3197 ns/op
WorkloadActual  26: 1048576 op, 830103593.00 ns, 791.6485 ns/op
WorkloadActual  27: 1048576 op, 769694686.00 ns, 734.0381 ns/op
WorkloadActual  28: 1048576 op, 792307782.00 ns, 755.6036 ns/op
WorkloadActual  29: 1048576 op, 761701083.00 ns, 726.4148 ns/op
WorkloadActual  30: 1048576 op, 733192037.00 ns, 699.2264 ns/op
WorkloadActual  31: 1048576 op, 801525570.00 ns, 764.3944 ns/op
WorkloadActual  32: 1048576 op, 738978310.00 ns, 704.7446 ns/op
WorkloadActual  33: 1048576 op, 741648662.00 ns, 707.2913 ns/op
WorkloadActual  34: 1048576 op, 744802902.00 ns, 710.2994 ns/op
WorkloadActual  35: 1048576 op, 745469366.00 ns, 710.9350 ns/op
WorkloadActual  36: 1048576 op, 735050263.00 ns, 700.9986 ns/op
WorkloadActual  37: 1048576 op, 762921174.00 ns, 727.5783 ns/op
WorkloadActual  38: 1048576 op, 728037661.00 ns, 694.3108 ns/op
WorkloadActual  39: 1048576 op, 779450863.00 ns, 743.3423 ns/op
WorkloadActual  40: 1048576 op, 766605068.00 ns, 731.0916 ns/op
WorkloadActual  41: 1048576 op, 855428262.00 ns, 815.8000 ns/op
WorkloadActual  42: 1048576 op, 809553782.00 ns, 772.0506 ns/op
WorkloadActual  43: 1048576 op, 832411963.00 ns, 793.8499 ns/op
WorkloadActual  44: 1048576 op, 862954685.00 ns, 822.9777 ns/op
WorkloadActual  45: 1048576 op, 805712731.00 ns, 768.3875 ns/op
WorkloadActual  46: 1048576 op, 787502009.00 ns, 751.0204 ns/op
WorkloadActual  47: 1048576 op, 829485505.00 ns, 791.0590 ns/op
WorkloadActual  48: 1048576 op, 822830891.00 ns, 784.7127 ns/op
WorkloadActual  49: 1048576 op, 853676708.00 ns, 814.1296 ns/op
WorkloadActual  50: 1048576 op, 792103151.00 ns, 755.4084 ns/op
WorkloadActual  51: 1048576 op, 807905572.00 ns, 770.4788 ns/op
WorkloadActual  52: 1048576 op, 808436260.00 ns, 770.9849 ns/op
WorkloadActual  53: 1048576 op, 807201848.00 ns, 769.8077 ns/op
WorkloadActual  54: 1048576 op, 778445336.00 ns, 742.3833 ns/op
WorkloadActual  55: 1048576 op, 865376908.00 ns, 825.2877 ns/op
WorkloadActual  56: 1048576 op, 814753798.00 ns, 777.0098 ns/op
WorkloadActual  57: 1048576 op, 795614507.00 ns, 758.7571 ns/op
WorkloadActual  58: 1048576 op, 776951370.00 ns, 740.9586 ns/op
WorkloadActual  59: 1048576 op, 770471810.00 ns, 734.7792 ns/op
WorkloadActual  60: 1048576 op, 769055534.00 ns, 733.4285 ns/op
WorkloadActual  61: 1048576 op, 783745712.00 ns, 747.4382 ns/op
WorkloadActual  62: 1048576 op, 787998981.00 ns, 751.4944 ns/op
WorkloadActual  63: 1048576 op, 782825109.00 ns, 746.5602 ns/op
WorkloadActual  64: 1048576 op, 783590142.00 ns, 747.2898 ns/op
WorkloadActual  65: 1048576 op, 800763932.00 ns, 763.6680 ns/op
WorkloadActual  66: 1048576 op, 804496137.00 ns, 767.2273 ns/op
WorkloadActual  67: 1048576 op, 812545406.00 ns, 774.9037 ns/op
WorkloadActual  68: 1048576 op, 815344103.00 ns, 777.5727 ns/op
WorkloadActual  69: 1048576 op, 808954037.00 ns, 771.4787 ns/op
WorkloadActual  70: 1048576 op, 910474562.00 ns, 868.2962 ns/op
WorkloadActual  71: 1048576 op, 829326239.00 ns, 790.9071 ns/op
WorkloadActual  72: 1048576 op, 824885580.00 ns, 786.6722 ns/op
WorkloadActual  73: 1048576 op, 798681079.00 ns, 761.6816 ns/op
WorkloadActual  74: 1048576 op, 822672125.00 ns, 784.5613 ns/op
WorkloadActual  75: 1048576 op, 807119165.00 ns, 769.7288 ns/op
WorkloadActual  76: 1048576 op, 829059374.00 ns, 790.6526 ns/op
WorkloadActual  77: 1048576 op, 901900105.00 ns, 860.1190 ns/op
WorkloadActual  78: 1048576 op, 808266841.00 ns, 770.8233 ns/op
WorkloadActual  79: 1048576 op, 798652067.00 ns, 761.6540 ns/op
WorkloadActual  80: 1048576 op, 809935183.00 ns, 772.4144 ns/op
WorkloadActual  81: 1048576 op, 786509897.00 ns, 750.0743 ns/op
WorkloadActual  82: 1048576 op, 792563128.00 ns, 755.8471 ns/op
WorkloadActual  83: 1048576 op, 851569484.00 ns, 812.1199 ns/op
WorkloadActual  84: 1048576 op, 831516357.00 ns, 792.9958 ns/op
WorkloadActual  85: 1048576 op, 817300664.00 ns, 779.4387 ns/op
WorkloadActual  86: 1048576 op, 825625301.00 ns, 787.3776 ns/op
WorkloadActual  87: 1048576 op, 819171350.00 ns, 781.2227 ns/op

// AfterActualRun
WorkloadResult   1: 1048576 op, 808626183.00 ns, 771.1660 ns/op
WorkloadResult   2: 1048576 op, 755816487.00 ns, 720.8028 ns/op
WorkloadResult   3: 1048576 op, 844307804.00 ns, 805.1947 ns/op
WorkloadResult   4: 1048576 op, 876190272.00 ns, 835.6002 ns/op
WorkloadResult   5: 1048576 op, 906829512.00 ns, 864.8200 ns/op
WorkloadResult   6: 1048576 op, 896293406.00 ns, 854.7720 ns/op
WorkloadResult   7: 1048576 op, 893730758.00 ns, 852.3281 ns/op
WorkloadResult   8: 1048576 op, 885164136.00 ns, 844.1583 ns/op
WorkloadResult   9: 1048576 op, 844102440.00 ns, 804.9988 ns/op
WorkloadResult  10: 1048576 op, 828372485.00 ns, 789.9976 ns/op
WorkloadResult  11: 1048576 op, 805465065.00 ns, 768.1513 ns/op
WorkloadResult  12: 1048576 op, 825092336.00 ns, 786.8694 ns/op
WorkloadResult  13: 1048576 op, 802191140.00 ns, 765.0291 ns/op
WorkloadResult  14: 1048576 op, 837019143.00 ns, 798.2437 ns/op
WorkloadResult  15: 1048576 op, 846242132.00 ns, 807.0394 ns/op
WorkloadResult  16: 1048576 op, 874480556.00 ns, 833.9696 ns/op
WorkloadResult  17: 1048576 op, 864554049.00 ns, 824.5030 ns/op
WorkloadResult  18: 1048576 op, 843806846.00 ns, 804.7169 ns/op
WorkloadResult  19: 1048576 op, 819037058.00 ns, 781.0946 ns/op
WorkloadResult  20: 1048576 op, 850412448.00 ns, 811.0165 ns/op
WorkloadResult  21: 1048576 op, 841731937.00 ns, 802.7381 ns/op
WorkloadResult  22: 1048576 op, 854783097.00 ns, 815.1847 ns/op
WorkloadResult  23: 1048576 op, 920723995.00 ns, 878.0708 ns/op
WorkloadResult  24: 1048576 op, 897896976.00 ns, 856.3013 ns/op
WorkloadResult  25: 1048576 op, 826634325.00 ns, 788.3399 ns/op
WorkloadResult  26: 1048576 op, 766225418.00 ns, 730.7295 ns/op
WorkloadResult  27: 1048576 op, 788838514.00 ns, 752.2950 ns/op
WorkloadResult  28: 1048576 op, 758231815.00 ns, 723.1062 ns/op
WorkloadResult  29: 1048576 op, 729722769.00 ns, 695.9179 ns/op
WorkloadResult  30: 1048576 op, 798056302.00 ns, 761.0858 ns/op
WorkloadResult  31: 1048576 op, 735509042.00 ns, 701.4361 ns/op
WorkloadResult  32: 1048576 op, 738179394.00 ns, 703.9827 ns/op
WorkloadResult  33: 1048576 op, 741333634.00 ns, 706.9908 ns/op
WorkloadResult  34: 1048576 op, 742000098.00 ns, 707.6264 ns/op
WorkloadResult  35: 1048576 op, 731580995.00 ns, 697.6900 ns/op
WorkloadResult  36: 1048576 op, 759451906.00 ns, 724.2698 ns/op
WorkloadResult  37: 1048576 op, 724568393.00 ns, 691.0023 ns/op
WorkloadResult  38: 1048576 op, 775981595.00 ns, 740.0337 ns/op
WorkloadResult  39: 1048576 op, 763135800.00 ns, 727.7830 ns/op
WorkloadResult  40: 1048576 op, 851958994.00 ns, 812.4914 ns/op
WorkloadResult  41: 1048576 op, 806084514.00 ns, 768.7421 ns/op
WorkloadResult  42: 1048576 op, 828942695.00 ns, 790.5414 ns/op
WorkloadResult  43: 1048576 op, 859485417.00 ns, 819.6692 ns/op
WorkloadResult  44: 1048576 op, 802243463.00 ns, 765.0790 ns/op
WorkloadResult  45: 1048576 op, 784032741.00 ns, 747.7119 ns/op
WorkloadResult  46: 1048576 op, 826016237.00 ns, 787.7505 ns/op
WorkloadResult  47: 1048576 op, 819361623.00 ns, 781.4041 ns/op
WorkloadResult  48: 1048576 op, 850207440.00 ns, 810.8210 ns/op
WorkloadResult  49: 1048576 op, 788633883.00 ns, 752.0999 ns/op
WorkloadResult  50: 1048576 op, 804436304.00 ns, 767.1702 ns/op
WorkloadResult  51: 1048576 op, 804966992.00 ns, 767.6763 ns/op
WorkloadResult  52: 1048576 op, 803732580.00 ns, 766.4991 ns/op
WorkloadResult  53: 1048576 op, 774976068.00 ns, 739.0748 ns/op
WorkloadResult  54: 1048576 op, 861907640.00 ns, 821.9792 ns/op
WorkloadResult  55: 1048576 op, 811284530.00 ns, 773.7012 ns/op
WorkloadResult  56: 1048576 op, 792145239.00 ns, 755.4486 ns/op
WorkloadResult  57: 1048576 op, 773482102.00 ns, 737.6500 ns/op
WorkloadResult  58: 1048576 op, 767002542.00 ns, 731.4706 ns/op
WorkloadResult  59: 1048576 op, 765586266.00 ns, 730.1200 ns/op
WorkloadResult  60: 1048576 op, 780276444.00 ns, 744.1296 ns/op
WorkloadResult  61: 1048576 op, 784529713.00 ns, 748.1858 ns/op
WorkloadResult  62: 1048576 op, 779355841.00 ns, 743.2516 ns/op
WorkloadResult  63: 1048576 op, 780120874.00 ns, 743.9812 ns/op
WorkloadResult  64: 1048576 op, 797294664.00 ns, 760.3594 ns/op
WorkloadResult  65: 1048576 op, 801026869.00 ns, 763.9188 ns/op
WorkloadResult  66: 1048576 op, 809076138.00 ns, 771.5951 ns/op
WorkloadResult  67: 1048576 op, 811874835.00 ns, 774.2642 ns/op
WorkloadResult  68: 1048576 op, 805484769.00 ns, 768.1701 ns/op
WorkloadResult  69: 1048576 op, 907005294.00 ns, 864.9877 ns/op
WorkloadResult  70: 1048576 op, 825856971.00 ns, 787.5986 ns/op
WorkloadResult  71: 1048576 op, 821416312.00 ns, 783.3636 ns/op
WorkloadResult  72: 1048576 op, 795211811.00 ns, 758.3731 ns/op
WorkloadResult  73: 1048576 op, 819202857.00 ns, 781.2527 ns/op
WorkloadResult  74: 1048576 op, 803649897.00 ns, 766.4203 ns/op
WorkloadResult  75: 1048576 op, 825590106.00 ns, 787.3441 ns/op
WorkloadResult  76: 1048576 op, 898430837.00 ns, 856.8104 ns/op
WorkloadResult  77: 1048576 op, 804797573.00 ns, 767.5148 ns/op
WorkloadResult  78: 1048576 op, 795182799.00 ns, 758.3454 ns/op
WorkloadResult  79: 1048576 op, 806465915.00 ns, 769.1058 ns/op
WorkloadResult  80: 1048576 op, 783040629.00 ns, 746.7657 ns/op
WorkloadResult  81: 1048576 op, 789093860.00 ns, 752.5385 ns/op
WorkloadResult  82: 1048576 op, 848100216.00 ns, 808.8114 ns/op
WorkloadResult  83: 1048576 op, 828047089.00 ns, 789.6872 ns/op
WorkloadResult  84: 1048576 op, 813831396.00 ns, 776.1301 ns/op
WorkloadResult  85: 1048576 op, 822156033.00 ns, 784.0691 ns/op
WorkloadResult  86: 1048576 op, 815702082.00 ns, 777.9141 ns/op
// GC:  187 0 0 1174405856 1048576
// Threading:  0 0 1048576

// AfterAll
// Benchmark Process 32350 has exited with code 0.

Mean = 775.477 ns, StdErr = 4.568 ns (0.59%), N = 86, StdDev = 42.364 ns
Min = 691.002 ns, Q1 = 747.830 ns, Median = 770.136 ns, Q3 = 804.222 ns, Max = 878.071 ns
IQR = 56.392 ns, LowerFence = 663.243 ns, UpperFence = 888.810 ns
ConfidenceInterval = [759.905 ns; 791.049 ns] (CI 99.9%), Margin = 15.572 ns (2.01% of Mean)
Skewness = 0.27, Kurtosis = 2.72, MValue = 2

// ** Remained 3 (50.0%) benchmark(s) to run. Estimated finish 2024-10-10 16:58 (0h 3m from now) **
// **************************
// Benchmark: MiddleQuarterBenchmark.OptimizedMethod: DefaultJob [ArraySize=1000]
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet 412cc443-8cc3-49ea-bdc5-a302fa51b339.dll --anonymousPipes 110 111 --benchmarkName "BenchmarkExample.MiddleQuarterBenchmark.OptimizedMethod(ArraySize: 1000)" --job Default --benchmarkId 3 in /private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339/bin/Release/net8.0
// Failed to set up high priority (Permission denied). In order to run benchmarks with high priority, make sure you have the right permissions.
// BeforeAnythingElse

// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX2,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: DefaultJob

OverheadJitting  1: 1 op, 338972.00 ns, 338.9720 us/op
WorkloadJitting  1: 1 op, 307574.00 ns, 307.5740 us/op

OverheadJitting  2: 16 op, 240620.00 ns, 15.0388 us/op
WorkloadJitting  2: 16 op, 274140.00 ns, 17.1338 us/op

WorkloadPilot    1: 16 op, 1652.00 ns, 103.2500 ns/op
WorkloadPilot    2: 32 op, 1047.00 ns, 32.7188 ns/op
WorkloadPilot    3: 64 op, 1332.00 ns, 20.8125 ns/op
WorkloadPilot    4: 128 op, 1665.00 ns, 13.0078 ns/op
WorkloadPilot    5: 256 op, 2733.00 ns, 10.6758 ns/op
WorkloadPilot    6: 512 op, 5098.00 ns, 9.9570 ns/op
WorkloadPilot    7: 1024 op, 9775.00 ns, 9.5459 ns/op
WorkloadPilot    8: 2048 op, 19215.00 ns, 9.3823 ns/op
WorkloadPilot    9: 4096 op, 37790.00 ns, 9.2261 ns/op
WorkloadPilot   10: 8192 op, 86390.00 ns, 10.5457 ns/op
WorkloadPilot   11: 16384 op, 176831.00 ns, 10.7929 ns/op
WorkloadPilot   12: 32768 op, 350909.00 ns, 10.7089 ns/op
WorkloadPilot   13: 65536 op, 704296.00 ns, 10.7467 ns/op
WorkloadPilot   14: 131072 op, 1226777.00 ns, 9.3596 ns/op
WorkloadPilot   15: 262144 op, 2542134.00 ns, 9.6975 ns/op
WorkloadPilot   16: 524288 op, 5088125.00 ns, 9.7048 ns/op
WorkloadPilot   17: 1048576 op, 10330043.00 ns, 9.8515 ns/op
WorkloadPilot   18: 2097152 op, 20487930.00 ns, 9.7694 ns/op
WorkloadPilot   19: 4194304 op, 39600121.00 ns, 9.4414 ns/op
WorkloadPilot   20: 8388608 op, 79599940.00 ns, 9.4891 ns/op
WorkloadPilot   21: 16777216 op, 113249947.00 ns, 6.7502 ns/op
WorkloadPilot   22: 33554432 op, 64971377.00 ns, 1.9363 ns/op
WorkloadPilot   23: 67108864 op, 132835162.00 ns, 1.9794 ns/op
WorkloadPilot   24: 134217728 op, 243769640.00 ns, 1.8162 ns/op
WorkloadPilot   25: 268435456 op, 456323368.00 ns, 1.6999 ns/op
WorkloadPilot   26: 536870912 op, 898293225.00 ns, 1.6732 ns/op

OverheadWarmup   1: 536870912 op, 915319006.00 ns, 1.7049 ns/op
OverheadWarmup   2: 536870912 op, 806583273.00 ns, 1.5024 ns/op
OverheadWarmup   3: 536870912 op, 788400190.00 ns, 1.4685 ns/op
OverheadWarmup   4: 536870912 op, 814117220.00 ns, 1.5164 ns/op
OverheadWarmup   5: 536870912 op, 787658341.00 ns, 1.4671 ns/op
OverheadWarmup   6: 536870912 op, 797989417.00 ns, 1.4864 ns/op
OverheadWarmup   7: 536870912 op, 788810764.00 ns, 1.4693 ns/op

OverheadActual   1: 536870912 op, 795190156.00 ns, 1.4812 ns/op
OverheadActual   2: 536870912 op, 784266478.00 ns, 1.4608 ns/op
OverheadActual   3: 536870912 op, 784139661.00 ns, 1.4606 ns/op
OverheadActual   4: 536870912 op, 786729287.00 ns, 1.4654 ns/op
OverheadActual   5: 536870912 op, 813749195.00 ns, 1.5157 ns/op
OverheadActual   6: 536870912 op, 815765782.00 ns, 1.5195 ns/op
OverheadActual   7: 536870912 op, 802024721.00 ns, 1.4939 ns/op
OverheadActual   8: 536870912 op, 838989508.00 ns, 1.5627 ns/op
OverheadActual   9: 536870912 op, 789552603.00 ns, 1.4707 ns/op
OverheadActual  10: 536870912 op, 797499818.00 ns, 1.4855 ns/op
OverheadActual  11: 536870912 op, 796270920.00 ns, 1.4832 ns/op
OverheadActual  12: 536870912 op, 791603839.00 ns, 1.4745 ns/op
OverheadActual  13: 536870912 op, 800121218.00 ns, 1.4903 ns/op
OverheadActual  14: 536870912 op, 790409509.00 ns, 1.4723 ns/op
OverheadActual  15: 536870912 op, 790164903.00 ns, 1.4718 ns/op

WorkloadWarmup   1: 536870912 op, 899839148.00 ns, 1.6761 ns/op
WorkloadWarmup   2: 536870912 op, 898888212.00 ns, 1.6743 ns/op
WorkloadWarmup   3: 536870912 op, 901989542.00 ns, 1.6801 ns/op
WorkloadWarmup   4: 536870912 op, 896452922.00 ns, 1.6698 ns/op
WorkloadWarmup   5: 536870912 op, 896526038.00 ns, 1.6699 ns/op
WorkloadWarmup   6: 536870912 op, 898458138.00 ns, 1.6735 ns/op
WorkloadWarmup   7: 536870912 op, 916789124.00 ns, 1.7077 ns/op
WorkloadWarmup   8: 536870912 op, 900911423.00 ns, 1.6781 ns/op

// BeforeActualRun
WorkloadActual   1: 536870912 op, 903271063.00 ns, 1.6825 ns/op
WorkloadActual   2: 536870912 op, 899324002.00 ns, 1.6751 ns/op
WorkloadActual   3: 536870912 op, 975376642.00 ns, 1.8168 ns/op
WorkloadActual   4: 536870912 op, 1006050943.00 ns, 1.8739 ns/op
WorkloadActual   5: 536870912 op, 920284735.00 ns, 1.7142 ns/op
WorkloadActual   6: 536870912 op, 901647752.00 ns, 1.6794 ns/op
WorkloadActual   7: 536870912 op, 898469556.00 ns, 1.6735 ns/op
WorkloadActual   8: 536870912 op, 901250625.00 ns, 1.6787 ns/op
WorkloadActual   9: 536870912 op, 914110100.00 ns, 1.7027 ns/op
WorkloadActual  10: 536870912 op, 904295551.00 ns, 1.6844 ns/op
WorkloadActual  11: 536870912 op, 959932979.00 ns, 1.7880 ns/op
WorkloadActual  12: 536870912 op, 902925054.00 ns, 1.6818 ns/op
WorkloadActual  13: 536870912 op, 911286106.00 ns, 1.6974 ns/op
WorkloadActual  14: 536870912 op, 923495879.00 ns, 1.7201 ns/op
WorkloadActual  15: 536870912 op, 926771100.00 ns, 1.7262 ns/op
WorkloadActual  16: 536870912 op, 911812243.00 ns, 1.6984 ns/op

// AfterActualRun
WorkloadResult   1: 536870912 op, 108080907.00 ns, 0.2013 ns/op
WorkloadResult   2: 536870912 op, 104133846.00 ns, 0.1940 ns/op
WorkloadResult   3: 536870912 op, 125094579.00 ns, 0.2330 ns/op
WorkloadResult   4: 536870912 op, 106457596.00 ns, 0.1983 ns/op
WorkloadResult   5: 536870912 op, 103279400.00 ns, 0.1924 ns/op
WorkloadResult   6: 536870912 op, 106060469.00 ns, 0.1976 ns/op
WorkloadResult   7: 536870912 op, 118919944.00 ns, 0.2215 ns/op
WorkloadResult   8: 536870912 op, 109105395.00 ns, 0.2032 ns/op
WorkloadResult   9: 536870912 op, 107734898.00 ns, 0.2007 ns/op
WorkloadResult  10: 536870912 op, 116095950.00 ns, 0.2162 ns/op
WorkloadResult  11: 536870912 op, 128305723.00 ns, 0.2390 ns/op
WorkloadResult  12: 536870912 op, 131580944.00 ns, 0.2451 ns/op
WorkloadResult  13: 536870912 op, 116622087.00 ns, 0.2172 ns/op
// GC:  0 0 0 736 536870912
// Threading:  0 0 536870912

// AfterAll
// Benchmark Process 32404 has exited with code 0.

Mean = 0.212 ns, StdErr = 0.005 ns (2.33%), N = 13, StdDev = 0.018 ns
Min = 0.192 ns, Q1 = 0.198 ns, Median = 0.203 ns, Q3 = 0.222 ns, Max = 0.245 ns
IQR = 0.023 ns, LowerFence = 0.163 ns, UpperFence = 0.256 ns
ConfidenceInterval = [0.191 ns; 0.234 ns] (CI 99.9%), Margin = 0.021 ns (10.07% of Mean)
Skewness = 0.55, Kurtosis = 1.71, MValue = 3.14

// ** Remained 2 (33.3%) benchmark(s) to run. Estimated finish 2024-10-10 16:57 (0h 2m from now) **
// **************************
// Benchmark: MiddleQuarterBenchmark.OriginalMethod: DefaultJob [ArraySize=10000]
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet 412cc443-8cc3-49ea-bdc5-a302fa51b339.dll --anonymousPipes 110 111 --benchmarkName "BenchmarkExample.MiddleQuarterBenchmark.OriginalMethod(ArraySize: 10000)" --job Default --benchmarkId 4 in /private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339/bin/Release/net8.0
// Failed to set up high priority (Permission denied). In order to run benchmarks with high priority, make sure you have the right permissions.
// BeforeAnythingElse

// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX2,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: DefaultJob

OverheadJitting  1: 1 op, 324465.00 ns, 324.4650 us/op
WorkloadJitting  1: 1 op, 1953828.00 ns, 1.9538 ms/op

OverheadJitting  2: 16 op, 550821.00 ns, 34.4263 us/op
WorkloadJitting  2: 16 op, 1627882.00 ns, 101.7426 us/op

WorkloadPilot    1: 16 op, 509774.00 ns, 31.8609 us/op
WorkloadPilot    2: 32 op, 881782.00 ns, 27.5557 us/op
WorkloadPilot    3: 64 op, 1679230.00 ns, 26.2380 us/op
WorkloadPilot    4: 128 op, 3730794.00 ns, 29.1468 us/op
WorkloadPilot    5: 256 op, 7178141.00 ns, 28.0396 us/op
WorkloadPilot    6: 512 op, 13975135.00 ns, 27.2952 us/op
WorkloadPilot    7: 1024 op, 26685753.00 ns, 26.0603 us/op
WorkloadPilot    8: 2048 op, 48847217.00 ns, 23.8512 us/op
WorkloadPilot    9: 4096 op, 88215147.00 ns, 21.5369 us/op
WorkloadPilot   10: 8192 op, 52780123.00 ns, 6.4429 us/op
WorkloadPilot   11: 16384 op, 103275518.00 ns, 6.3034 us/op
WorkloadPilot   12: 32768 op, 208515803.00 ns, 6.3634 us/op
WorkloadPilot   13: 65536 op, 412000963.00 ns, 6.2866 us/op
WorkloadPilot   14: 131072 op, 826583861.00 ns, 6.3063 us/op

OverheadWarmup   1: 131072 op, 450046.00 ns, 3.4336 ns/op
OverheadWarmup   2: 131072 op, 415853.00 ns, 3.1727 ns/op
OverheadWarmup   3: 131072 op, 425324.00 ns, 3.2450 ns/op
OverheadWarmup   4: 131072 op, 416585.00 ns, 3.1783 ns/op
OverheadWarmup   5: 131072 op, 416577.00 ns, 3.1782 ns/op
OverheadWarmup   6: 131072 op, 507837.00 ns, 3.8745 ns/op
OverheadWarmup   7: 131072 op, 425968.00 ns, 3.2499 ns/op

OverheadActual   1: 131072 op, 527187.00 ns, 4.0221 ns/op
OverheadActual   2: 131072 op, 416189.00 ns, 3.1753 ns/op
OverheadActual   3: 131072 op, 417142.00 ns, 3.1825 ns/op
OverheadActual   4: 131072 op, 462623.00 ns, 3.5295 ns/op
OverheadActual   5: 131072 op, 463584.00 ns, 3.5369 ns/op
OverheadActual   6: 131072 op, 426228.00 ns, 3.2519 ns/op
OverheadActual   7: 131072 op, 505213.00 ns, 3.8545 ns/op
OverheadActual   8: 131072 op, 437390.00 ns, 3.3370 ns/op
OverheadActual   9: 131072 op, 416096.00 ns, 3.1746 ns/op
OverheadActual  10: 131072 op, 452334.00 ns, 3.4510 ns/op
OverheadActual  11: 131072 op, 415935.00 ns, 3.1733 ns/op
OverheadActual  12: 131072 op, 416014.00 ns, 3.1739 ns/op
OverheadActual  13: 131072 op, 425863.00 ns, 3.2491 ns/op
OverheadActual  14: 131072 op, 426325.00 ns, 3.2526 ns/op
OverheadActual  15: 131072 op, 426035.00 ns, 3.2504 ns/op
OverheadActual  16: 131072 op, 560637.00 ns, 4.2773 ns/op
OverheadActual  17: 131072 op, 523158.00 ns, 3.9914 ns/op
OverheadActual  18: 131072 op, 542838.00 ns, 4.1415 ns/op
OverheadActual  19: 131072 op, 415855.00 ns, 3.1727 ns/op
OverheadActual  20: 131072 op, 415808.00 ns, 3.1724 ns/op

WorkloadWarmup   1: 131072 op, 827724690.00 ns, 6.3150 us/op
WorkloadWarmup   2: 131072 op, 829706388.00 ns, 6.3302 us/op
WorkloadWarmup   3: 131072 op, 829714681.00 ns, 6.3302 us/op
WorkloadWarmup   4: 131072 op, 822668670.00 ns, 6.2765 us/op
WorkloadWarmup   5: 131072 op, 780790160.00 ns, 5.9570 us/op
WorkloadWarmup   6: 131072 op, 798926876.00 ns, 6.0953 us/op
WorkloadWarmup   7: 131072 op, 835742054.00 ns, 6.3762 us/op
WorkloadWarmup   8: 131072 op, 804290164.00 ns, 6.1362 us/op

// BeforeActualRun
WorkloadActual   1: 131072 op, 908281533.00 ns, 6.9296 us/op
WorkloadActual   2: 131072 op, 883718417.00 ns, 6.7422 us/op
WorkloadActual   3: 131072 op, 891863565.00 ns, 6.8044 us/op
WorkloadActual   4: 131072 op, 850363130.00 ns, 6.4878 us/op
WorkloadActual   5: 131072 op, 919033801.00 ns, 7.0117 us/op
WorkloadActual   6: 131072 op, 888663821.00 ns, 6.7800 us/op
WorkloadActual   7: 131072 op, 841306346.00 ns, 6.4187 us/op
WorkloadActual   8: 131072 op, 858034220.00 ns, 6.5463 us/op
WorkloadActual   9: 131072 op, 925477574.00 ns, 7.0608 us/op
WorkloadActual  10: 131072 op, 855857715.00 ns, 6.5297 us/op
WorkloadActual  11: 131072 op, 856367827.00 ns, 6.5336 us/op
WorkloadActual  12: 131072 op, 854154449.00 ns, 6.5167 us/op
WorkloadActual  13: 131072 op, 889926695.00 ns, 6.7896 us/op
WorkloadActual  14: 131072 op, 830578598.00 ns, 6.3368 us/op
WorkloadActual  15: 131072 op, 886187449.00 ns, 6.7611 us/op
WorkloadActual  16: 131072 op, 878971761.00 ns, 6.7060 us/op
WorkloadActual  17: 131072 op, 870043408.00 ns, 6.6379 us/op
WorkloadActual  18: 131072 op, 889807401.00 ns, 6.7887 us/op
WorkloadActual  19: 131072 op, 918456114.00 ns, 7.0073 us/op
WorkloadActual  20: 131072 op, 853275458.00 ns, 6.5100 us/op
WorkloadActual  21: 131072 op, 840411785.00 ns, 6.4118 us/op
WorkloadActual  22: 131072 op, 877186346.00 ns, 6.6924 us/op
WorkloadActual  23: 131072 op, 846426659.00 ns, 6.4577 us/op
WorkloadActual  24: 131072 op, 815501426.00 ns, 6.2218 us/op
WorkloadActual  25: 131072 op, 796336475.00 ns, 6.0756 us/op
WorkloadActual  26: 131072 op, 784549351.00 ns, 5.9856 us/op
WorkloadActual  27: 131072 op, 774171017.00 ns, 5.9065 us/op
WorkloadActual  28: 131072 op, 784612877.00 ns, 5.9861 us/op
WorkloadActual  29: 131072 op, 818756406.00 ns, 6.2466 us/op
WorkloadActual  30: 131072 op, 841644340.00 ns, 6.4212 us/op
WorkloadActual  31: 131072 op, 833382859.00 ns, 6.3582 us/op
WorkloadActual  32: 131072 op, 925784639.00 ns, 7.0632 us/op
WorkloadActual  33: 131072 op, 1016917724.00 ns, 7.7585 us/op
WorkloadActual  34: 131072 op, 836461457.00 ns, 6.3817 us/op
WorkloadActual  35: 131072 op, 803753558.00 ns, 6.1322 us/op
WorkloadActual  36: 131072 op, 853240333.00 ns, 6.5097 us/op
WorkloadActual  37: 131072 op, 812217405.00 ns, 6.1967 us/op
WorkloadActual  38: 131072 op, 791479415.00 ns, 6.0385 us/op
WorkloadActual  39: 131072 op, 801875571.00 ns, 6.1178 us/op
WorkloadActual  40: 131072 op, 832974421.00 ns, 6.3551 us/op
WorkloadActual  41: 131072 op, 800633758.00 ns, 6.1084 us/op
WorkloadActual  42: 131072 op, 783283889.00 ns, 5.9760 us/op
WorkloadActual  43: 131072 op, 796813691.00 ns, 6.0792 us/op
WorkloadActual  44: 131072 op, 813230504.00 ns, 6.2045 us/op
WorkloadActual  45: 131072 op, 789292769.00 ns, 6.0218 us/op
WorkloadActual  46: 131072 op, 814118678.00 ns, 6.2112 us/op
WorkloadActual  47: 131072 op, 804848013.00 ns, 6.1405 us/op
WorkloadActual  48: 131072 op, 823802215.00 ns, 6.2851 us/op
WorkloadActual  49: 131072 op, 823418105.00 ns, 6.2822 us/op
WorkloadActual  50: 131072 op, 801276955.00 ns, 6.1133 us/op
WorkloadActual  51: 131072 op, 835907506.00 ns, 6.3775 us/op
WorkloadActual  52: 131072 op, 781625606.00 ns, 5.9633 us/op
WorkloadActual  53: 131072 op, 777036763.00 ns, 5.9283 us/op
WorkloadActual  54: 131072 op, 789037147.00 ns, 6.0199 us/op
WorkloadActual  55: 131072 op, 799298592.00 ns, 6.0982 us/op
WorkloadActual  56: 131072 op, 770820132.00 ns, 5.8809 us/op
WorkloadActual  57: 131072 op, 777518075.00 ns, 5.9320 us/op
WorkloadActual  58: 131072 op, 788739613.00 ns, 6.0176 us/op
WorkloadActual  59: 131072 op, 774677372.00 ns, 5.9103 us/op
WorkloadActual  60: 131072 op, 790014171.00 ns, 6.0273 us/op
WorkloadActual  61: 131072 op, 788177063.00 ns, 6.0133 us/op
WorkloadActual  62: 131072 op, 778630733.00 ns, 5.9405 us/op
WorkloadActual  63: 131072 op, 784858365.00 ns, 5.9880 us/op
WorkloadActual  64: 131072 op, 784875626.00 ns, 5.9881 us/op
WorkloadActual  65: 131072 op, 890866280.00 ns, 6.7968 us/op
WorkloadActual  66: 131072 op, 781365534.00 ns, 5.9613 us/op
WorkloadActual  67: 131072 op, 795382791.00 ns, 6.0683 us/op
WorkloadActual  68: 131072 op, 788808135.00 ns, 6.0181 us/op
WorkloadActual  69: 131072 op, 779620883.00 ns, 5.9480 us/op
WorkloadActual  70: 131072 op, 785160325.00 ns, 5.9903 us/op
WorkloadActual  71: 131072 op, 851843272.00 ns, 6.4990 us/op
WorkloadActual  72: 131072 op, 851499404.00 ns, 6.4964 us/op
WorkloadActual  73: 131072 op, 879832012.00 ns, 6.7126 us/op
WorkloadActual  74: 131072 op, 856624707.00 ns, 6.5355 us/op
WorkloadActual  75: 131072 op, 907851970.00 ns, 6.9264 us/op
WorkloadActual  76: 131072 op, 844420722.00 ns, 6.4424 us/op
WorkloadActual  77: 131072 op, 870675703.00 ns, 6.6427 us/op
WorkloadActual  78: 131072 op, 894318450.00 ns, 6.8231 us/op
WorkloadActual  79: 131072 op, 876181557.00 ns, 6.6847 us/op
WorkloadActual  80: 131072 op, 849679721.00 ns, 6.4825 us/op
WorkloadActual  81: 131072 op, 898552460.00 ns, 6.8554 us/op
WorkloadActual  82: 131072 op, 861444566.00 ns, 6.5723 us/op
WorkloadActual  83: 131072 op, 875204688.00 ns, 6.6773 us/op
WorkloadActual  84: 131072 op, 851926706.00 ns, 6.4997 us/op

// AfterActualRun
WorkloadResult   1: 131072 op, 907855256.50 ns, 6.9264 us/op
WorkloadResult   2: 131072 op, 883292140.50 ns, 6.7390 us/op
WorkloadResult   3: 131072 op, 891437288.50 ns, 6.8011 us/op
WorkloadResult   4: 131072 op, 849936853.50 ns, 6.4845 us/op
WorkloadResult   5: 131072 op, 918607524.50 ns, 7.0084 us/op
WorkloadResult   6: 131072 op, 888237544.50 ns, 6.7767 us/op
WorkloadResult   7: 131072 op, 840880069.50 ns, 6.4154 us/op
WorkloadResult   8: 131072 op, 857607943.50 ns, 6.5430 us/op
WorkloadResult   9: 131072 op, 925051297.50 ns, 7.0576 us/op
WorkloadResult  10: 131072 op, 855431438.50 ns, 6.5264 us/op
WorkloadResult  11: 131072 op, 855941550.50 ns, 6.5303 us/op
WorkloadResult  12: 131072 op, 853728172.50 ns, 6.5134 us/op
WorkloadResult  13: 131072 op, 889500418.50 ns, 6.7863 us/op
WorkloadResult  14: 131072 op, 830152321.50 ns, 6.3336 us/op
WorkloadResult  15: 131072 op, 885761172.50 ns, 6.7578 us/op
WorkloadResult  16: 131072 op, 878545484.50 ns, 6.7028 us/op
WorkloadResult  17: 131072 op, 869617131.50 ns, 6.6347 us/op
WorkloadResult  18: 131072 op, 889381124.50 ns, 6.7854 us/op
WorkloadResult  19: 131072 op, 918029837.50 ns, 7.0040 us/op
WorkloadResult  20: 131072 op, 852849181.50 ns, 6.5067 us/op
WorkloadResult  21: 131072 op, 839985508.50 ns, 6.4086 us/op
WorkloadResult  22: 131072 op, 876760069.50 ns, 6.6891 us/op
WorkloadResult  23: 131072 op, 846000382.50 ns, 6.4545 us/op
WorkloadResult  24: 131072 op, 815075149.50 ns, 6.2185 us/op
WorkloadResult  25: 131072 op, 795910198.50 ns, 6.0723 us/op
WorkloadResult  26: 131072 op, 784123074.50 ns, 5.9824 us/op
WorkloadResult  27: 131072 op, 773744740.50 ns, 5.9032 us/op
WorkloadResult  28: 131072 op, 784186600.50 ns, 5.9829 us/op
WorkloadResult  29: 131072 op, 818330129.50 ns, 6.2434 us/op
WorkloadResult  30: 131072 op, 841218063.50 ns, 6.4180 us/op
WorkloadResult  31: 131072 op, 832956582.50 ns, 6.3550 us/op
WorkloadResult  32: 131072 op, 925358362.50 ns, 7.0599 us/op
WorkloadResult  33: 131072 op, 836035180.50 ns, 6.3784 us/op
WorkloadResult  34: 131072 op, 803327281.50 ns, 6.1289 us/op
WorkloadResult  35: 131072 op, 852814056.50 ns, 6.5065 us/op
WorkloadResult  36: 131072 op, 811791128.50 ns, 6.1935 us/op
WorkloadResult  37: 131072 op, 791053138.50 ns, 6.0353 us/op
WorkloadResult  38: 131072 op, 801449294.50 ns, 6.1146 us/op
WorkloadResult  39: 131072 op, 832548144.50 ns, 6.3518 us/op
WorkloadResult  40: 131072 op, 800207481.50 ns, 6.1051 us/op
WorkloadResult  41: 131072 op, 782857612.50 ns, 5.9727 us/op
WorkloadResult  42: 131072 op, 796387414.50 ns, 6.0760 us/op
WorkloadResult  43: 131072 op, 812804227.50 ns, 6.2012 us/op
WorkloadResult  44: 131072 op, 788866492.50 ns, 6.0186 us/op
WorkloadResult  45: 131072 op, 813692401.50 ns, 6.2080 us/op
WorkloadResult  46: 131072 op, 804421736.50 ns, 6.1373 us/op
WorkloadResult  47: 131072 op, 823375938.50 ns, 6.2819 us/op
WorkloadResult  48: 131072 op, 822991828.50 ns, 6.2789 us/op
WorkloadResult  49: 131072 op, 800850678.50 ns, 6.1100 us/op
WorkloadResult  50: 131072 op, 835481229.50 ns, 6.3742 us/op
WorkloadResult  51: 131072 op, 781199329.50 ns, 5.9601 us/op
WorkloadResult  52: 131072 op, 776610486.50 ns, 5.9251 us/op
WorkloadResult  53: 131072 op, 788610870.50 ns, 6.0166 us/op
WorkloadResult  54: 131072 op, 798872315.50 ns, 6.0949 us/op
WorkloadResult  55: 131072 op, 770393855.50 ns, 5.8776 us/op
WorkloadResult  56: 131072 op, 777091798.50 ns, 5.9287 us/op
WorkloadResult  57: 131072 op, 788313336.50 ns, 6.0144 us/op
WorkloadResult  58: 131072 op, 774251095.50 ns, 5.9071 us/op
WorkloadResult  59: 131072 op, 789587894.50 ns, 6.0241 us/op
WorkloadResult  60: 131072 op, 787750786.50 ns, 6.0101 us/op
WorkloadResult  61: 131072 op, 778204456.50 ns, 5.9372 us/op
WorkloadResult  62: 131072 op, 784432088.50 ns, 5.9847 us/op
WorkloadResult  63: 131072 op, 784449349.50 ns, 5.9849 us/op
WorkloadResult  64: 131072 op, 890440003.50 ns, 6.7935 us/op
WorkloadResult  65: 131072 op, 780939257.50 ns, 5.9581 us/op
WorkloadResult  66: 131072 op, 794956514.50 ns, 6.0650 us/op
WorkloadResult  67: 131072 op, 788381858.50 ns, 6.0149 us/op
WorkloadResult  68: 131072 op, 779194606.50 ns, 5.9448 us/op
WorkloadResult  69: 131072 op, 784734048.50 ns, 5.9870 us/op
WorkloadResult  70: 131072 op, 851416995.50 ns, 6.4958 us/op
WorkloadResult  71: 131072 op, 851073127.50 ns, 6.4932 us/op
WorkloadResult  72: 131072 op, 879405735.50 ns, 6.7093 us/op
WorkloadResult  73: 131072 op, 856198430.50 ns, 6.5323 us/op
WorkloadResult  74: 131072 op, 907425693.50 ns, 6.9231 us/op
WorkloadResult  75: 131072 op, 843994445.50 ns, 6.4392 us/op
WorkloadResult  76: 131072 op, 870249426.50 ns, 6.6395 us/op
WorkloadResult  77: 131072 op, 893892173.50 ns, 6.8199 us/op
WorkloadResult  78: 131072 op, 875755280.50 ns, 6.6815 us/op
WorkloadResult  79: 131072 op, 849253444.50 ns, 6.4793 us/op
WorkloadResult  80: 131072 op, 898126183.50 ns, 6.8522 us/op
WorkloadResult  81: 131072 op, 861018289.50 ns, 6.5690 us/op
WorkloadResult  82: 131072 op, 874778411.50 ns, 6.6740 us/op
WorkloadResult  83: 131072 op, 851500429.50 ns, 6.4964 us/op
// GC:  211 0 0 1326449376 131072
// Threading:  0 0 131072

// AfterAll
// Benchmark Process 32428 has exited with code 0.

Mean = 6.365 us, StdErr = 0.037 us (0.58%), N = 83, StdDev = 0.336 us
Min = 5.878 us, Q1 = 6.030 us, Median = 6.374 us, Q3 = 6.637 us, Max = 7.060 us
IQR = 0.607 us, LowerFence = 5.119 us, UpperFence = 7.548 us
ConfidenceInterval = [6.239 us; 6.491 us] (CI 99.9%), Margin = 0.126 us (1.98% of Mean)
Skewness = 0.29, Kurtosis = 1.88, MValue = 2.85

// ** Remained 1 (16.7%) benchmark(s) to run. Estimated finish 2024-10-10 16:57 (0h 1m from now) **
// **************************
// Benchmark: MiddleQuarterBenchmark.OptimizedMethod: DefaultJob [ArraySize=10000]
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet 412cc443-8cc3-49ea-bdc5-a302fa51b339.dll --anonymousPipes 110 111 --benchmarkName "BenchmarkExample.MiddleQuarterBenchmark.OptimizedMethod(ArraySize: 10000)" --job Default --benchmarkId 5 in /private/var/folders/gl/10x44fsn6bn1799lkf_g096r0000gn/T/ob-csharp-project-7DdeVG/bin/Release/net8.0/412cc443-8cc3-49ea-bdc5-a302fa51b339/bin/Release/net8.0
// Failed to set up high priority (Permission denied). In order to run benchmarks with high priority, make sure you have the right permissions.
// BeforeAnythingElse

// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX2,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: DefaultJob

OverheadJitting  1: 1 op, 326727.00 ns, 326.7270 us/op
WorkloadJitting  1: 1 op, 285695.00 ns, 285.6950 us/op

OverheadJitting  2: 16 op, 220273.00 ns, 13.7671 us/op
WorkloadJitting  2: 16 op, 229893.00 ns, 14.3683 us/op

WorkloadPilot    1: 16 op, 1546.00 ns, 96.6250 ns/op
WorkloadPilot    2: 32 op, 1101.00 ns, 34.4063 ns/op
WorkloadPilot    3: 64 op, 964.00 ns, 15.0625 ns/op
WorkloadPilot    4: 128 op, 1684.00 ns, 13.1563 ns/op
WorkloadPilot    5: 256 op, 2785.00 ns, 10.8789 ns/op
WorkloadPilot    6: 512 op, 5276.00 ns, 10.3047 ns/op
WorkloadPilot    7: 1024 op, 9688.00 ns, 9.4609 ns/op
WorkloadPilot    8: 2048 op, 18977.00 ns, 9.2661 ns/op
WorkloadPilot    9: 4096 op, 37289.00 ns, 9.1038 ns/op
WorkloadPilot   10: 8192 op, 74437.00 ns, 9.0865 ns/op
WorkloadPilot   11: 16384 op, 147200.00 ns, 8.9844 ns/op
WorkloadPilot   12: 32768 op, 293845.00 ns, 8.9674 ns/op
WorkloadPilot   13: 65536 op, 587388.00 ns, 8.9628 ns/op
WorkloadPilot   14: 131072 op, 1175486.00 ns, 8.9682 ns/op
WorkloadPilot   15: 262144 op, 2375316.00 ns, 9.0611 ns/op
WorkloadPilot   16: 524288 op, 4961307.00 ns, 9.4629 ns/op
WorkloadPilot   17: 1048576 op, 9698635.00 ns, 9.2493 ns/op
WorkloadPilot   18: 2097152 op, 19191489.00 ns, 9.1512 ns/op
WorkloadPilot   19: 4194304 op, 40268543.00 ns, 9.6008 ns/op
WorkloadPilot   20: 8388608 op, 50765302.00 ns, 6.0517 ns/op
WorkloadPilot   21: 16777216 op, 32157415.00 ns, 1.9167 ns/op
WorkloadPilot   22: 33554432 op, 62377463.00 ns, 1.8590 ns/op
WorkloadPilot   23: 67108864 op, 127778490.00 ns, 1.9040 ns/op
WorkloadPilot   24: 134217728 op, 233624876.00 ns, 1.7406 ns/op
WorkloadPilot   25: 268435456 op, 472544829.00 ns, 1.7604 ns/op
WorkloadPilot   26: 536870912 op, 913006014.00 ns, 1.7006 ns/op

OverheadWarmup   1: 536870912 op, 862340843.00 ns, 1.6062 ns/op
OverheadWarmup   2: 536870912 op, 813324369.00 ns, 1.5149 ns/op
OverheadWarmup   3: 536870912 op, 782261868.00 ns, 1.4571 ns/op
OverheadWarmup   4: 536870912 op, 786397965.00 ns, 1.4648 ns/op
OverheadWarmup   5: 536870912 op, 774872238.00 ns, 1.4433 ns/op
OverheadWarmup   6: 536870912 op, 776919449.00 ns, 1.4471 ns/op
OverheadWarmup   7: 536870912 op, 792691886.00 ns, 1.4765 ns/op
OverheadWarmup   8: 536870912 op, 796269629.00 ns, 1.4832 ns/op
OverheadWarmup   9: 536870912 op, 803721877.00 ns, 1.4970 ns/op
OverheadWarmup  10: 536870912 op, 840053963.00 ns, 1.5647 ns/op

OverheadActual   1: 536870912 op, 846157436.00 ns, 1.5761 ns/op
OverheadActual   2: 536870912 op, 792112782.00 ns, 1.4754 ns/op
OverheadActual   3: 536870912 op, 792450909.00 ns, 1.4761 ns/op
OverheadActual   4: 536870912 op, 789790923.00 ns, 1.4711 ns/op
OverheadActual   5: 536870912 op, 794339556.00 ns, 1.4796 ns/op
OverheadActual   6: 536870912 op, 794546379.00 ns, 1.4800 ns/op
OverheadActual   7: 536870912 op, 787452660.00 ns, 1.4667 ns/op
OverheadActual   8: 536870912 op, 789719050.00 ns, 1.4710 ns/op
OverheadActual   9: 536870912 op, 794722585.00 ns, 1.4803 ns/op
OverheadActual  10: 536870912 op, 789434979.00 ns, 1.4704 ns/op
OverheadActual  11: 536870912 op, 800546258.00 ns, 1.4911 ns/op
OverheadActual  12: 536870912 op, 791691605.00 ns, 1.4746 ns/op
OverheadActual  13: 536870912 op, 786974640.00 ns, 1.4659 ns/op
OverheadActual  14: 536870912 op, 788589307.00 ns, 1.4689 ns/op
OverheadActual  15: 536870912 op, 788363021.00 ns, 1.4684 ns/op

WorkloadWarmup   1: 536870912 op, 901163196.00 ns, 1.6785 ns/op
WorkloadWarmup   2: 536870912 op, 905583204.00 ns, 1.6868 ns/op
WorkloadWarmup   3: 536870912 op, 908590785.00 ns, 1.6924 ns/op
WorkloadWarmup   4: 536870912 op, 902516507.00 ns, 1.6811 ns/op
WorkloadWarmup   5: 536870912 op, 935085164.00 ns, 1.7417 ns/op
WorkloadWarmup   6: 536870912 op, 906816747.00 ns, 1.6891 ns/op

// BeforeActualRun
WorkloadActual   1: 536870912 op, 904344077.00 ns, 1.6845 ns/op
WorkloadActual   2: 536870912 op, 904858876.00 ns, 1.6854 ns/op
WorkloadActual   3: 536870912 op, 901080600.00 ns, 1.6784 ns/op
WorkloadActual   4: 536870912 op, 901844336.00 ns, 1.6798 ns/op
WorkloadActual   5: 536870912 op, 940924890.00 ns, 1.7526 ns/op
WorkloadActual   6: 536870912 op, 980627729.00 ns, 1.8266 ns/op
WorkloadActual   7: 536870912 op, 919756488.00 ns, 1.7132 ns/op
WorkloadActual   8: 536870912 op, 901864833.00 ns, 1.6799 ns/op
WorkloadActual   9: 536870912 op, 921396370.00 ns, 1.7162 ns/op
WorkloadActual  10: 536870912 op, 903290386.00 ns, 1.6825 ns/op
WorkloadActual  11: 536870912 op, 911930112.00 ns, 1.6986 ns/op
WorkloadActual  12: 536870912 op, 943739973.00 ns, 1.7579 ns/op
WorkloadActual  13: 536870912 op, 938967406.00 ns, 1.7490 ns/op
WorkloadActual  14: 536870912 op, 1062162712.00 ns, 1.9784 ns/op
WorkloadActual  15: 536870912 op, 926441220.00 ns, 1.7256 ns/op
WorkloadActual  16: 536870912 op, 915403568.00 ns, 1.7051 ns/op
WorkloadActual  17: 536870912 op, 936037843.00 ns, 1.7435 ns/op
WorkloadActual  18: 536870912 op, 918185245.00 ns, 1.7103 ns/op
WorkloadActual  19: 536870912 op, 915139504.00 ns, 1.7046 ns/op
WorkloadActual  20: 536870912 op, 910603196.00 ns, 1.6961 ns/op

// AfterActualRun
WorkloadResult   1: 536870912 op, 112652472.00 ns, 0.2098 ns/op
WorkloadResult   2: 536870912 op, 113167271.00 ns, 0.2108 ns/op
WorkloadResult   3: 536870912 op, 109388995.00 ns, 0.2038 ns/op
WorkloadResult   4: 536870912 op, 110152731.00 ns, 0.2052 ns/op
WorkloadResult   5: 536870912 op, 149233285.00 ns, 0.2780 ns/op
WorkloadResult   6: 536870912 op, 188936124.00 ns, 0.3519 ns/op
WorkloadResult   7: 536870912 op, 128064883.00 ns, 0.2385 ns/op
WorkloadResult   8: 536870912 op, 110173228.00 ns, 0.2052 ns/op
WorkloadResult   9: 536870912 op, 129704765.00 ns, 0.2416 ns/op
WorkloadResult  10: 536870912 op, 111598781.00 ns, 0.2079 ns/op
WorkloadResult  11: 536870912 op, 120238507.00 ns, 0.2240 ns/op
WorkloadResult  12: 536870912 op, 152048368.00 ns, 0.2832 ns/op
WorkloadResult  13: 536870912 op, 147275801.00 ns, 0.2743 ns/op
WorkloadResult  14: 536870912 op, 134749615.00 ns, 0.2510 ns/op
WorkloadResult  15: 536870912 op, 123711963.00 ns, 0.2304 ns/op
WorkloadResult  16: 536870912 op, 144346238.00 ns, 0.2689 ns/op
WorkloadResult  17: 536870912 op, 126493640.00 ns, 0.2356 ns/op
WorkloadResult  18: 536870912 op, 123447899.00 ns, 0.2299 ns/op
WorkloadResult  19: 536870912 op, 118911591.00 ns, 0.2215 ns/op
// GC:  0 0 0 736 536870912
// Threading:  0 0 536870912

// AfterAll
// Benchmark Process 32480 has exited with code 0.

Mean = 0.241 ns, StdErr = 0.009 ns (3.58%), N = 19, StdDev = 0.037 ns
Min = 0.204 ns, Q1 = 0.210 ns, Median = 0.230 ns, Q3 = 0.260 ns, Max = 0.352 ns
IQR = 0.050 ns, LowerFence = 0.136 ns, UpperFence = 0.334 ns
ConfidenceInterval = [0.207 ns; 0.274 ns] (CI 99.9%), Margin = 0.034 ns (14.02% of Mean)
Skewness = 1.33, Kurtosis = 4.49, MValue = 2

// ** Remained 0 (0.0%) benchmark(s) to run. Estimated finish 2024-10-10 16:57 (0h 0m from now) **
// ***** BenchmarkRunner: Finish  *****

// * Export *
  BenchmarkDotNet.Artifacts/results/BenchmarkExample.MiddleQuarterBenchmark-report.csv
  BenchmarkDotNet.Artifacts/results/BenchmarkExample.MiddleQuarterBenchmark-report-github.md
  BenchmarkDotNet.Artifacts/results/BenchmarkExample.MiddleQuarterBenchmark-report.html

// * Detailed results *
MiddleQuarterBenchmark.OriginalMethod: DefaultJob [ArraySize=100]
Runtime = .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2; GC = Concurrent Workstation
Mean = 118.541 ns, StdErr = 0.713 ns (0.60%), N = 86, StdDev = 6.617 ns
Min = 109.449 ns, Q1 = 113.256 ns, Median = 116.848 ns, Q3 = 123.054 ns, Max = 136.234 ns
IQR = 9.797 ns, LowerFence = 98.561 ns, UpperFence = 137.749 ns
ConfidenceInterval = [116.109 ns; 120.973 ns] (CI 99.9%), Margin = 2.432 ns (2.05% of Mean)
Skewness = 0.74, Kurtosis = 2.76, MValue = 2
-------------------- Histogram --------------------
[107.482 ns ; 110.317 ns) | @@
[110.317 ns ; 114.252 ns) | @@@@@@@@@@@@@@@@@@@@@@@@@@@
[114.252 ns ; 118.385 ns) | @@@@@@@@@@@@@@@@@@@@@@
[118.385 ns ; 123.711 ns) | @@@@@@@@@@@@@@@@@
[123.711 ns ; 127.919 ns) | @@@@@@@@@@
[127.919 ns ; 132.156 ns) | @@@@@
[132.156 ns ; 137.083 ns) | @@@
---------------------------------------------------

MiddleQuarterBenchmark.OptimizedMethod: DefaultJob [ArraySize=100]
Runtime = .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2; GC = Concurrent Workstation
Mean = 0.192 ns, StdErr = 0.005 ns (2.40%), N = 13, StdDev = 0.017 ns
Min = 0.171 ns, Q1 = 0.178 ns, Median = 0.191 ns, Q3 = 0.205 ns, Max = 0.221 ns
IQR = 0.026 ns, LowerFence = 0.139 ns, UpperFence = 0.244 ns
ConfidenceInterval = [0.172 ns; 0.212 ns] (CI 99.9%), Margin = 0.020 ns (10.35% of Mean)
Skewness = 0.2, Kurtosis = 1.51, MValue = 3
-------------------- Histogram --------------------
[0.168 ns ; 0.186 ns) | @@@@@@
[0.186 ns ; 0.203 ns) | @@
[0.203 ns ; 0.222 ns) | @@@@@
---------------------------------------------------

MiddleQuarterBenchmark.OriginalMethod: DefaultJob [ArraySize=1000]
Runtime = .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2; GC = Concurrent Workstation
Mean = 775.477 ns, StdErr = 4.568 ns (0.59%), N = 86, StdDev = 42.364 ns
Min = 691.002 ns, Q1 = 747.830 ns, Median = 770.136 ns, Q3 = 804.222 ns, Max = 878.071 ns
IQR = 56.392 ns, LowerFence = 663.243 ns, UpperFence = 888.810 ns
ConfidenceInterval = [759.905 ns; 791.049 ns] (CI 99.9%), Margin = 15.572 ns (2.01% of Mean)
Skewness = 0.27, Kurtosis = 2.72, MValue = 2
-------------------- Histogram --------------------
[678.405 ns ; 706.634 ns) | @@@@@
[706.634 ns ; 736.771 ns) | @@@@@@@@@
[736.771 ns ; 764.916 ns) | @@@@@@@@@@@@@@@@@@
[764.916 ns ; 801.024 ns) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[801.024 ns ; 826.218 ns) | @@@@@@@@@@@@@
[826.218 ns ; 867.170 ns) | @@@@@@@@@
[867.170 ns ; 890.668 ns) | @
---------------------------------------------------

MiddleQuarterBenchmark.OptimizedMethod: DefaultJob [ArraySize=1000]
Runtime = .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2; GC = Concurrent Workstation
Mean = 0.212 ns, StdErr = 0.005 ns (2.33%), N = 13, StdDev = 0.018 ns
Min = 0.192 ns, Q1 = 0.198 ns, Median = 0.203 ns, Q3 = 0.222 ns, Max = 0.245 ns
IQR = 0.023 ns, LowerFence = 0.163 ns, UpperFence = 0.256 ns
ConfidenceInterval = [0.191 ns; 0.234 ns] (CI 99.9%), Margin = 0.021 ns (10.07% of Mean)
Skewness = 0.55, Kurtosis = 1.71, MValue = 3.14
-------------------- Histogram --------------------
[0.188 ns ; 0.208 ns) | @@@@@@@
[0.208 ns ; 0.215 ns) |
[0.215 ns ; 0.235 ns) | @@@@
[0.235 ns ; 0.255 ns) | @@
---------------------------------------------------

MiddleQuarterBenchmark.OriginalMethod: DefaultJob [ArraySize=10000]
Runtime = .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2; GC = Concurrent Workstation
Mean = 6.365 us, StdErr = 0.037 us (0.58%), N = 83, StdDev = 0.336 us
Min = 5.878 us, Q1 = 6.030 us, Median = 6.374 us, Q3 = 6.637 us, Max = 7.060 us
IQR = 0.607 us, LowerFence = 5.119 us, UpperFence = 7.548 us
ConfidenceInterval = [6.239 us; 6.491 us] (CI 99.9%), Margin = 0.126 us (1.98% of Mean)
Skewness = 0.29, Kurtosis = 1.88, MValue = 2.85
-------------------- Histogram --------------------
[5.777 us ; 5.919 us) | @@@
[5.919 us ; 6.121 us) | @@@@@@@@@@@@@@@@@@@@@@@@@@
[6.121 us ; 6.346 us) | @@@@@@@@@@
[6.346 us ; 6.549 us) | @@@@@@@@@@@@@@@@@@@@@
[6.549 us ; 6.828 us) | @@@@@@@@@@@@@@@@
[6.828 us ; 7.093 us) | @@@@@@@
---------------------------------------------------

MiddleQuarterBenchmark.OptimizedMethod: DefaultJob [ArraySize=10000]
Runtime = .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2; GC = Concurrent Workstation
Mean = 0.241 ns, StdErr = 0.009 ns (3.58%), N = 19, StdDev = 0.037 ns
Min = 0.204 ns, Q1 = 0.210 ns, Median = 0.230 ns, Q3 = 0.260 ns, Max = 0.352 ns
IQR = 0.050 ns, LowerFence = 0.136 ns, UpperFence = 0.334 ns
ConfidenceInterval = [0.207 ns; 0.274 ns] (CI 99.9%), Margin = 0.034 ns (14.02% of Mean)
Skewness = 1.33, Kurtosis = 4.49, MValue = 2
-------------------- Histogram --------------------
[0.203 ns ; 0.240 ns) | @@@@@@@@@@@@
[0.240 ns ; 0.286 ns) | @@@@@@
[0.286 ns ; 0.333 ns) |
[0.333 ns ; 0.370 ns) | @
---------------------------------------------------

// * Summary *

BenchmarkDotNet v0.14.0, macOS Sequoia 15.0 (24A335) [Darwin 24.0.0]
Intel Core i5-10600 CPU 3.30GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK 8.0.300
  [Host]     : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2


| Method          | ArraySize | Mean          | Error       | StdDev      | Ratio | RatioSD | Gen0   | Allocated | Alloc Ratio |
|---------------- |---------- |--------------:|------------:|------------:|------:|--------:|-------:|----------:|------------:|
| OriginalMethod  | 100       |   118.5412 ns |   2.4320 ns |   6.6165 ns | 1.003 |    0.08 | 0.0356 |     224 B |        1.00 |
| OptimizedMethod | 100       |     0.1923 ns |   0.0199 ns |   0.0166 ns | 0.002 |    0.00 |      - |         - |        0.00 |
|                 |           |               |             |             |       |         |        |           |             |
| OriginalMethod  | 1000      |   775.4773 ns |  15.5718 ns |  42.3643 ns | 1.003 |    0.08 | 0.1783 |    1120 B |        1.00 |
| OptimizedMethod | 1000      |     0.2123 ns |   0.0214 ns |   0.0179 ns | 0.000 |    0.00 |      - |         - |        0.00 |
|                 |           |               |             |             |       |         |        |           |             |
| OriginalMethod  | 10000     | 6,365.3923 ns | 125.9133 ns | 336.0881 ns | 1.003 |    0.07 | 1.6098 |   10120 B |        1.00 |
| OptimizedMethod | 10000     |     0.2406 ns |   0.0337 ns |   0.0375 ns | 0.000 |    0.00 |      - |         - |        0.00 |

// * Warnings *
MultimodalDistribution
  MiddleQuarterBenchmark.OriginalMethod: Default -> It seems that the distribution can have several modes (mValue = 2.85)

// * Hints *
Outliers
  MiddleQuarterBenchmark.OptimizedMethod: Default -> 2 outliers were removed (1.74 ns, 1.87 ns)
  MiddleQuarterBenchmark.OriginalMethod: Default  -> 1 outlier  was  removed (923.32 ns)
  MiddleQuarterBenchmark.OptimizedMethod: Default -> 3 outliers were removed (1.79 ns..1.87 ns)
  MiddleQuarterBenchmark.OriginalMethod: Default  -> 1 outlier  was  removed (7.76 us)
  MiddleQuarterBenchmark.OptimizedMethod: Default -> 1 outlier  was  removed (1.98 ns)

// * Legends *
  ArraySize   : Value of the 'ArraySize' parameter
  Mean        : Arithmetic mean of all measurements
  Error       : Half of 99.9% confidence interval
  StdDev      : Standard deviation of all measurements
  Ratio       : Mean of the ratio distribution ([Current]/[Baseline])
  RatioSD     : Standard deviation of the ratio distribution ([Current]/[Baseline])
  Gen0        : GC Generation 0 collects per 1000 operations
  Allocated   : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
  Alloc Ratio : Allocated memory ratio distribution ([Current]/[Baseline])
  1 ns        : 1 Nanosecond (0.000000001 sec)

// * Diagnostic Output - MemoryDiagnoser *


// ***** BenchmarkRunner: End *****
Run time: 00:06:28 (388.42 sec), executed benchmarks: 6

Global total time: 00:06:36 (396.01 sec), executed benchmarks: 6
// * Artifacts cleanup *
Artifacts cleanup is finished

性能对比:

MethodArraySizeMeanErrorStdDevRatioRatioSDGen0AllocatedAlloc Ratio
OriginalMethod100118.5412 ns2.4320 ns6.6165 ns1.0030.080.0356224 B1.00
OptimizedMethod1000.1923 ns0.0199 ns0.0166 ns0.0020.00--0.00
OriginalMethod1000775.4773 ns15.5718 ns42.3643 ns1.0030.080.17831120 B1.00
OptimizedMethod10000.2123 ns0.0214 ns0.0179 ns0.0000.00--0.00
OriginalMethod100006,365.3923 ns125.9133 ns336.0881 ns1.0030.071.609810120 B1.00
OptimizedMethod100000.2406 ns0.0337 ns0.0375 ns0.0000.00--0.00

通过使用 Span<T>, 我们显著减少了执行时间和内存分配.

Span 的限制

using System;
using System.Threading.Tasks;
public async Task ProcessDataAsync(Span<int> data)
{
    await Task.Delay(1000); // 模拟异步操作
    // 处理数据
    for (int i = 0; i < data.Length; i++)
    {
        data[i] *= 2;
    }
}

会得到一个错误 error CS4012: Parameters or locals of type 'Span<int>' cannot be declared in async methods or async lambda expressions.

因为Span只能放到栈上, 所以只能在函数中作为自动变量

  • 不能装箱(Boxing)
  • 不能作为类的字段
  • 不能在异步方法中作为参数或局部变量
  • 不能用于async的函数的参数或者返回值

解决方法:

使用 Memory<T>, 它是 Span<T> 的堆上表示, 可以在异步方法中使用.

public async Task ProcessDataAsync(Memory<byte> memory)
{
    // 在异步方法中使用 Memory<T>
    Span<byte> span = memory.Span;
    // 处理数据
}

Memory

为了克服 Span<T> 的编程上不方便的问题. 稍微牺牲一点性能, 我们就有了 Memory<T>.

特性SpanMemory
类型ref struct普通结构体
分配位置只能在栈上分配可以在堆上分配
用途高性能, 短生命周期的内存操作可在异步和长期存储场景中使用
异步支持不支持支持
转换无法转换可以转换为 Span
示例用法临时处理数据, 避免堆分配异步方法参数, 跨方法共享内存
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Linq;

[MemoryDiagnoser]
public class MiddleQuarterBenchmark
{
    private int[] array;

    // 参数化数组大小
    [Params(100, 1000, 10000)]
    public int ArraySize { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        // 初始化数组, 包含从 0 到 ArraySize - 1 的元素
        array = Enumerable.Range(0, ArraySize).ToArray();
    }

    /// <summary>
    /// 原始方法: 使用 LINQ 的 Skip 和 Take
    /// 返回一个新的数组
    /// </summary>
    [Benchmark(Baseline = true)]
    public int[] OriginalMethod()
    {
        return array.Skip(array.Length / 2).Take(array.Length / 4).ToArray();
    }

    /// <summary>
    /// 优化方法: 使用 Span<T> 进行切片
    /// 返回一个新的数组以便与 OriginalMethod 公平比w较
    /// </summary>
    [Benchmark]
    public Span<int> SpanMethod()
    {
        int halfLength = array.Length / 2;
        int quarterLength = array.Length / 4;
        Span<int> span = array.AsSpan();
        return span.Slice(halfLength, quarterLength);
    }
    /// <summary>
    /// 优化方法: 使用 Memory<T> 进行切片
    /// 返回一个新的数组以便与 OriginalMethod 公平比w较
    /// </summary>
    [Benchmark]
    public Memory<int> MemoryMethod()
    {
        int halfLength = array.Length / 2;
        int quarterLength = array.Length / 4;
        Memory<int> mem = array.AsMemory();
        return mem.Slice(halfLength, quarterLength);
    }
}
class Program
{
    static void Main(string[] args)
    {
        // 运行基准测试
        var summary = BenchmarkRunner.Run<MiddleQuarterBenchmark>();
    }
}

结论是数据量小的时候看不出啥来

MethodArraySizeMeanErrorStdDevRatioRatioSDGen0AllocatedAlloc Ratio
OriginalMethod100120.2539 ns2.4660 ns3.9822 ns1.0010.050.0356224 B1.00
SpanMethod1000.1988 ns0.0056 ns0.0043 ns0.0020.00--0.00
MemoryMethod1003.8368 ns0.0329 ns0.0308 ns0.0320.00--0.00
OriginalMethod1000683.0629 ns7.2691 ns6.4438 ns1.0000.010.17831120 B1.00
SpanMethod10000.1933 ns0.0093 ns0.0087 ns0.0000.00--0.00
MemoryMethod10003.7898 ns0.0278 ns0.0247 ns0.0060.00--0.00
OriginalMethod100005,791.0869 ns39.4943 ns30.8346 ns1.0000.011.609810120 B1.00
SpanMethod100000.2130 ns0.0166 ns0.0156 ns0.0000.00--0.00
MemoryMethod100003.8532 ns0.0628 ns0.0525 ns0.0010.00--0.00

ArrayPool

ArrayPool<T> 提供了一个数组池, 用于重用数组, 减少内存分配和垃圾回收压力.

public void ProcessData()
{
    byte[] buffer = ArrayPool<byte>.Shared.Rent(1024);

    try
    {
        // 使用 buffer
    }
    finally
    {
        ArrayPool<byte>.Shared.Return(buffer);
    }
}

注意:

  • 租用的数组可能比请求的大小更大.
  • 返回数组时, 默认不会清除数据, 需要手动清除或确保数据安全.

System.IO.Pipelines

System.IO.Pipelines 是 .NET 提供的一个高性能, 现代化的 I/O 库, 旨在简化和优化数据的读取与写入过程, 尤其适用于需要高吞吐量和低延迟的应用场景, 如网络服务器, 实时数据处理和流媒体应用等. 它通过引入管道(Pipeline)的概念, 提供了一种高效的, 基于内存管理优化的方式来处理数据流.

System.IO.Pipelines 解决的问题

在传统的 .NET I/O 编程中, 常用的方式是基于 Stream 类(如 FileStream, NetworkStream 等)进行数据的同步或异步读取与写入. 然而, 这种方法存在一些性能瓶颈和内存管理问题, 尤其是在处理大量数据或高并发场景下. System.IO.Pipelines 通过以下方式解决了这些问题:

  1. 高效的内存管理
    • 减少内存分配和复制: System.IO.Pipelines 使用 Span<T>Memory<T> 来管理内存, 减少了不必要的内存分配和数据复制, 从而提升性能.
    • 缓冲池化: 内置的缓冲池机制(如 ArrayPool<T>)允许重用内存缓冲区, 进一步减少了垃圾回收(GC)的压力.
  2. 异步和并行处理
    • 解耦生产者和消费者: 管道模型将数据的生产者(如读取数据的过程)与消费者(如处理数据的过程)解耦, 使得它们可以独立地异步运行, 提高了数据处理的效率和响应速度.
    • 高吞吐量: 通过并行处理数据, System.IO.Pipelines 能够更好地利用多核处理器, 提高整体吞吐量.
  3. 灵活的数据处理
    • 流式处理: 支持流式数据处理, 使得应用程序能够逐步处理数据, 而无需等待整个数据块的读取或写入完成.
    • 简化的 API: 提供了简洁且强大的 API, 简化了复杂的异步 I/O 操作, 使开发者能够更专注于业务逻辑的实现.
  4. 性能优化
    • 最小化上下文切换: 通过高效的缓冲和内存管理, 减少了线程上下文切换的次数, 从而降低了延迟.
    • 优化的读取和写入策略: 内置的策略优化了数据的读取和写入方式, 确保数据以最优的方式传输和处理.

System.IO.Pipelines 的工作原理

System.IO.Pipelines 基于生产者-消费者模型, 主要由 PipeReaderPipeWriter 组成. 这两个组件通过一个内置的缓冲区(Pipe)进行通信和数据传输.

1. PipeWriter

  • 数据写入: 生产者使用 PipeWriter 将数据写入管道.
  • 缓冲管理: PipeWriter 管理缓冲区, 确保数据高效地写入, 且避免不必要的内存分配.

2. PipeReader

  • 据读取: 消费者使用 PipeReader 从管道中读取数据.
  • 数据处理: 消费者可以逐步处理数据, 而无需一次性获取所有数据.
  1. 管道连接
    • 数据流动: 数据从 PipeWriter 流向 PipeReader, 通过管道的缓冲区进行传输和存储.
    • 并行处理: 生产者和消费者可以并行地写入和读取数据, 提升整体性能.

示例: 使用 System.IO.Pipelines 构建简单的 TCP 服务器

以下是一个使用 System.IO.Pipelines 构建简单 TCP 服务器的示例, 展示了如何高效地读取和写入网络数据.

  1. 添加必要的引用 确保您的项目引用了 System.IO.Pipelines. 在 .NET Core 3.0 及以上版本中, System.IO.Pipelines 是内置的, 无需额外添加 NuGet 包.

  2. TCP 服务器代码示例

    using System;
    using System.Buffers;
    using System.IO.Pipelines;
    using System.Net;
    using System.Net.Sockets;
    using System.Text;
    using System.Threading.Tasks;
    
    public class TcpPipelineServer
    {
        private readonly IPAddress _ipAddress;
        private readonly int _port;
    
        public TcpPipelineServer(IPAddress ipAddress, int port)
        {
            _ipAddress = ipAddress;
            _port = port;
        }
    
        public async Task StartAsync()
        {
            TcpListener listener = new TcpListener(_ipAddress, _port);
            listener.Start();
            Console.WriteLine($"Server listening on {_ipAddress}:{_port}");
    
            while (true)
            {
                TcpClient client = await listener.AcceptTcpClientAsync();
                Console.WriteLine("Client connected.");
                _ = HandleClientAsync(client);
            }
        }
    
        private async Task HandleClientAsync(TcpClient client)
        {
            using (client)
            {
                NetworkStream networkStream = client.GetStream();
                Pipe pipe = new Pipe();
    
                Task writing = FillPipeAsync(networkStream, pipe.Writer);
                Task reading = ReadPipeAsync(networkStream, pipe.Reader);
    
                await Task.WhenAll(reading, writing);
            }
        }
    
        private async Task FillPipeAsync(NetworkStream networkStream, PipeWriter writer)
        {
            const int minimumBufferSize = 512;
    
            while (true)
            {
                // 请求一个内存块
                Memory<byte> memory = writer.GetMemory(minimumBufferSize);
                try
                {
                    // 从网络流中读取数据
                    int bytesRead = await networkStream.ReadAsync(memory);
                    if (bytesRead == 0)
                    {
                        break; // 客户端关闭连接
                    }
    
                    // 指示写入了多少字节
                    writer.Advance(bytesRead);
                }
                catch (Exception ex)
                {
                    Console.WriteLine($"Error reading from network: {ex.Message}");
                    break;
                }
    
                // 提交写入
                FlushResult result = await writer.FlushAsync();
    
                if (result.IsCompleted || result.IsCanceled)
                {
                    break;
                }
            }
    
            // 完成写入
            await writer.CompleteAsync();
        }
    
        private async Task ReadPipeAsync(NetworkStream networkStream, PipeReader reader)
        {
            while (true)
            {
                ReadResult result = await reader.ReadAsync();
                ReadOnlySequence<byte> buffer = result.Buffer;
    
                SequencePosition? position = null;
    
                // 查找换行符作为消息的分隔符
                position = buffer.PositionOf((byte)'\n');
    
                while (position != null)
                {
                    ReadOnlySequence<byte> message = buffer.Slice(0, position.Value);
                    ProcessMessage(message);
                    // 跳过换行符
                    buffer = buffer.Slice(buffer.GetPosition(1, position.Value));
                    position = buffer.PositionOf((byte)'\n');
                }
    
                // 指示哪些部分已经被读取
                reader.AdvanceTo(buffer.Start, buffer.End);
    
                if (result.IsCompleted)
                {
                    break;
                }
            }
    
            // 完成读取
            await reader.CompleteAsync();
        }
    
        private void ProcessMessage(ReadOnlySequence<byte> message)
        {
            // 将字节序列转换为字符串
            string received = Encoding.UTF8.GetString(message.ToArray());
            Console.WriteLine($"Received: {received}");
    
            // 简单回显消息
            // 在实际应用中, 可以根据需要处理消息
        }
    }
    
    class Program
    {
        static async Task Main(string[] args)
        {
            IPAddress ipAddress = IPAddress.Loopback;
            int port = 9000;
    
            TcpPipelineServer server = new TcpPipelineServer(ipAddress, port);
            await server.StartAsync();
        }
    }
    
  3. 代码解析

    1. 初始化服务器

      TcpListener listener = new TcpListener(_ipAddress, _port);
      listener.Start();
      

      创建并启动一个 TCP 监听器, 监听指定的 IP 地址和端口.

    2. 接受客户端连接

      TcpClient client = await listener.AcceptTcpClientAsync();
      

      异步等待客户端连接, 并为每个连接启动一个独立的处理任务.

    3. 处理客户端连接

      private async Task HandleClientAsync(TcpClient client)
      {
          // 使用 Pipe 分离读取和写入操作
          Pipe pipe = new Pipe();
      
          Task writing = FillPipeAsync(networkStream, pipe.Writer);
          Task reading = ReadPipeAsync(networkStream, pipe.Reader);
      
          await Task.WhenAll(reading, writing);
      }
      

      为每个客户端连接创建一个 Pipe, 并启动两个独立的任务: 一个用于从网络流中读取数据并写入管道, 另一个用于从管道中读取数据并处理.

    4. 填充管道

      private async Task FillPipeAsync(NetworkStream networkStream, PipeWriter writer)
      {
          // 读取数据并写入管道
      }
      

      使用 PipeWriter 从网络流中异步读取数据, 并写入到管道中.

    5. 读取管道数据

      private async Task ReadPipeAsync(NetworkStream networkStream, PipeReader reader)
      {
          // 从管道中异步读取数据并处理
      }
      

      使用 PipeReader 从管道中异步读取数据, 并按需处理. 例如, 查找换行符作为消息的分隔符, 并处理完整的消息.

    6. 消息处理

      private void ProcessMessage(ReadOnlySequence<byte> message)
      {
          // 处理收到的消息
      }
      

      将接收到的字节序列转换为字符串, 并根据需要处理消息. 在此示例中, 简单地将消息回显到控制台.

  4. 运行和测试

    1. 启动服务器: 运行上述代码, 将启动一个 TCP 服务器, 监听本地回环地址(127.0.0.1)上的 9000 端口.
    2. 连接客户端: 使用 Telnet, Netcat 或自定义客户端连接到服务器, 并发送消息. 服务器将接收并回显消息.

与传统 Stream 的比较

特性System.IO.PipelinesStream
内存管理高效, 基于 Span<T>Memory<T>, 减少内存分配和复制传统, 可能导致较多的内存分配和复制
异步性能更高, 支持解耦的生产者-消费者模型较低, 基于同步和异步方法的直接调用
读写操作的灵活性更高, 支持流式处理和分段处理较低, 通常需要一次性读取或写入大块数据
API 简洁性更简洁, 专为高性能设计较为复杂, 适用于广泛的通用场景
并行和多线程支持优化, 支持并行读写支持, 但需要手动管理并行读写的同步和协调
数据处理的可扩展性高, 易于扩展和集成到现代异步编程模型中一般, 适用于大多数传统 I/O 操作

适用场景

System.IO.Pipelines 特别适用于以下场景:

  1. 高性能网络服务器: 如 Web 服务器, 实时通信服务器, 要求高吞吐量和低延迟.
  2. 实时数据处理: 如日志处理, 数据流处理, 要求高效地处理和转发数据.
  3. 流媒体应用: 如音视频流传输, 要求持续, 高效的数据流处理.
  4. 文件 I/O: 需要高效读写大型文件或频繁操作文件的应用.

System.Text.Json

using System.Collections.Generic;
using System.Text.Json;
using BenchmarkDotNet.Attributes;
using System;
using System.Buffers;
using System.Collections.Generic;
using System.IO.Pipelines;
using System.Text;
using System.Threading.Tasks;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Jobs;


public class Person
{
    public int Id { get; set; }
    public string Name { get; set; }
    public int Age { get; set; }
}

public static class DataGenerator
{
    public static byte[] GenerateJsonData(int count)
    {
        var people = new List<Person>(count);
        for (int i = 0; i < count; i++)
        {
            people.Add(new Person
            {
                Id = i,
                Name = $"Person {i}",
                Age = i % 120 // 年龄在 0 到 119 之间
            });
        }

        return JsonSerializer.SerializeToUtf8Bytes(people);
    }
}


[MemoryDiagnoser]
[ShortRunJob]
public class JsonBenchmarks
{
    private static byte[] jsonData;
    private static int recordCount;

    [GlobalSetup]
    public void Setup()
    {
        // 生成 100,000 条记录的 JSON 数据
        recordCount = 100000;
        jsonData = DataGenerator.GenerateJsonData(recordCount);
    }

    [Benchmark]
    public List<Person> Deserialize_WithJsonSerializer()
    {
        return JsonSerializer.Deserialize<List<Person>>(jsonData);
    }

    [Benchmark]
    public void Parse_WithJsonDocument()
    {
        using (JsonDocument doc = JsonDocument.Parse(jsonData))
        {
            int count = 0;
            foreach (var element in doc.RootElement.EnumerateArray())
            {
                int id = element.GetProperty("Id").GetInt32();
                string name = element.GetProperty("Name").GetString();
                int age = element.GetProperty("Age").GetInt32();
                count++;
            }
        }
    }

    [Benchmark]
    public void Parse_WithUtf8JsonReader()
    {
        var reader = new Utf8JsonReader(jsonData);
        int count = 0;

        while (reader.Read())
        {
            if (reader.TokenType == JsonTokenType.StartObject)
            {
                int id = 0;
                string name = null;
                int age = 0;

                while (reader.Read() && reader.TokenType != JsonTokenType.EndObject)
                {
                    if (reader.TokenType == JsonTokenType.PropertyName)
                    {
                        string propertyName = reader.GetString();
                        reader.Read();
                        switch (propertyName)
                        {
                            case "Id":
                                id = reader.GetInt32();
                                break;
                            case "Name":
                                name = reader.GetString();
                                break;
                            case "Age":
                                age = reader.GetInt32();
                                break;
                        }
                    }
                }
                count++;
            }
        }
    }

    [Benchmark]
    public List<Person> Deserialize_WithJsonSerializer_Span()
    {
        return JsonSerializer.Deserialize<List<Person>>(jsonData.AsSpan());
    }

    [Benchmark]
    public void Parse_WithUtf8JsonReader_Optimized()
    {
        ArrayPool<byte> pool = ArrayPool<byte>.Shared;
        byte[] buffer = pool.Rent(jsonData.Length);
        try
        {
            Buffer.BlockCopy(jsonData, 0, buffer, 0, jsonData.Length);
            ReadOnlySpan<byte> span = buffer.AsSpan(0, jsonData.Length);
            var reader = new Utf8JsonReader(span);
            int count = 0;

            while (reader.Read())
            {
                if (reader.TokenType == JsonTokenType.StartObject)
                {
                    int id = 0;
                    string name = null;
                    int age = 0;

                    while (reader.Read() && reader.TokenType != JsonTokenType.EndObject)
                    {
                        if (reader.TokenType == JsonTokenType.PropertyName)
                        {
                            string propertyName = reader.GetString();
                            reader.Read();
                            switch (propertyName)
                            {
                                case "Id":
                                    id = reader.GetInt32();
                                    break;
                                case "Name":
                                    name = reader.GetString();
                                    break;
                                case "Age":
                                    age = reader.GetInt32();
                                    break;
                            }
                        }
                    }
                    count++;
                }
            }
        }
        finally
        {
            pool.Return(buffer);
        }
    }

    [Benchmark]
    public async Task<List<Person>> Deserialize_WithJsonSerializer_Pipelines()
    {
        var pipe = new Pipe();
        await pipe.Writer.WriteAsync(jsonData);
        pipe.Writer.Complete();

        var reader = pipe.Reader;
        var people = new List<Person>();

        while (true)
        {
            ReadResult result = await reader.ReadAsync();
            ReadOnlySequence<byte> buffer = result.Buffer;

            if (buffer.IsSingleSegment)
            {
                people.AddRange(JsonSerializer.Deserialize<List<Person>>(buffer.FirstSpan));
            }
            else
            {
                // 如果数据跨多个段, 可以合并为一个 Span
                byte[] combined = new byte[buffer.Length];
                buffer.CopyTo(combined);
                people.AddRange(JsonSerializer.Deserialize<List<Person>>(combined));
            }

            reader.AdvanceTo(buffer.End);

            if (result.IsCompleted)
            {
                break;
            }
        }

        await reader.CompleteAsync();
        return people;
    }
}


class Program
{
    static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<JsonBenchmarks>();
    }
}

总结

在本文中, 我们讨论了什么是性能, 以及如何在 .NET 应用程序中测量和优化性能. 我们重点介绍了以下高性能 API:

  • Span: 用于高效处理内存, 无需复制数据.
  • ArrayPool: 通过重用数组, 减少内存分配和垃圾回收.
  • System.IO.Pipelines: 用于高性能 I/O 操作, 优化内存和缓冲区管理.
  • System.Text.Json: 高性能的 JSON 序列化和反序列化库.

在进行性能优化时, 需要注意:

  • 测量: 始终在优化前后进行测量, 以验证优化效果.
  • 热点路径: 专注于应用程序的热点路径, 优先优化最有影响的部分.
  • 权衡: 在性能和可读性之间找到平衡, 避免过度优化导致代码难以维护.

参考资料

Tags: c# .NET performance