How async tasks work in .NET

When I was writing my previous post Should I Task.Wait() or await Task? I was slowly coming to the realization that it builds on a layer of abstraction some of us are not familiar with. It is definitely great to dive deep into the thread pool behavior, but a more basic question remains open – how does C# implement tasks? Most developers don’t ask this question until they reach a certain level of mastery or until performance tanks without obvious reason. In this post, we will look at the trickery that the C# compiler pulls off in coordination with the core .NET library to achieve a convenient coding paradigm.

Write a block of text into the file

We will start our journey by looking at the best application of asynchronous programming – storage I/O. Our application has a few methods that record managed thread identifiers on entry and exit. Deep within the call stack, we will be making a call to write a string into the file. Our code is written in a completely synchronous fashion, the way it used to be written before .NET had tasks. Our entire application runs on a single thread starting from Main().

using System;
using System.IO;
using System.Text;
using System.Threading;

namespace TaskTest
{
    class Program
    {
        private static void ExecuteOuterTask()
        {
            Console.WriteLine($"ExecuteOuterTask Entry Point thread ID is {Thread.CurrentThread.ManagedThreadId}");

            ExecuteNestedTask();

            Console.WriteLine($"ExecuteOuterTask Exit Point thread ID is {Thread.CurrentThread.ManagedThreadId}");
        }

        private static void ExecuteNestedTask()
        {
            Console.WriteLine($"ExecuteNestedTask Entry Point thread ID is {Thread.CurrentThread.ManagedThreadId}");

            // Build a buffer of text to be written into the file.
            // Pay attention to the number "273" - it is not random.
            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < 273; i++)
            {
                sb.Append("Awesome Content");
            }

            string content = sb.ToString();

            Console.WriteLine($"Content Length is {content.Length}");

            File.WriteAllText("AwesomeFile.txt", content);

            Console.WriteLine($"ExecuteNestedTask Exit Point thread ID is {Thread.CurrentThread.ManagedThreadId}");
        }

        static void Main(string[] args)
        {
            Console.WriteLine($"Main Entry Point thread ID is {Thread.CurrentThread.ManagedThreadId}");

            ExecuteOuterTask();

            Console.WriteLine($"Main Exit Point thread ID is {Thread.CurrentThread.ManagedThreadId}");
        }
    }
}

When we run this application we get the same output every time. Main() method calls ExecuteOuterTask(), which then calls ExecuteNestedTask(). All methods run on the same thread as indicated by “1” in the output window below. Call stack right after File.WriteAllText() will be straightforward and predictable.

Main Entry Point thread ID is 1
ExecuteOuterTask Entry Point thread ID is 1
ExecuteNestedTask Entry Point thread ID is 1
Content Length is 4095
ExecuteNestedTask Exit Point thread ID is 1
ExecuteOuterTask Exit Point thread ID is 1
Main Exit Point thread ID is 1
>TaskTest.dll!TaskTest.Program.ExecuteNestedTask() Line 37	C#
 TaskTest.dll!TaskTest.Program.ExecuteOuterTask() Line 14	C#
 TaskTest.dll!TaskTest.Program.Main(string[] args) Line 44	C#

Just to make sure there is nothing special going on here we will decompile our assembly and see what it looks like. My favorite .NET decompiler is ILSpy. The source code we see before is exactly what we wrote in our application. At this point compiler did not alter our application in any meaningful way.

Decompiled synchronous code
Decompiled synchronous code

Go async

Let’s take our application and convert every method, except Main() into async. ExecuteOuterTask() will await on ExecuteNestedTask() since they are both async. The functionality of the application will remain the same. We only change method signatures and invocation syntax slightly.

using System;
using System.IO;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace TaskTest
{
    class Program
    {
        private static async Task ExecuteOuterTask()
        {
            Console.WriteLine($"ExecuteOuterTask Entry Point thread ID is {Thread.CurrentThread.ManagedThreadId}");

            await ExecuteNestedTask();

            Console.WriteLine($"ExecuteOuterTask Exit Point thread ID is {Thread.CurrentThread.ManagedThreadId}");
        }

        private static async Task ExecuteNestedTask()
        {
            Console.WriteLine($"ExecuteNestedTask Entry Point thread ID is {Thread.CurrentThread.ManagedThreadId}");

            // Build a buffer of text to be written into the file.
            // Pay attention to the number "273" - it is not random.
            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < 273; i++)
            {
                sb.Append("Awesome Content");
            }

            string content = sb.ToString();

            Console.WriteLine($"Content Length is {content.Length}");

            await File.WriteAllTextAsync("AwesomeFile.txt", content);

            Console.WriteLine($"ExecuteNestedTask Exit Point thread ID is {Thread.CurrentThread.ManagedThreadId}");
        }

        static void Main(string[] args)
        {
            Console.WriteLine($"Main Entry Point thread ID is {Thread.CurrentThread.ManagedThreadId}");

            ExecuteOuterTask().Wait();

            Console.WriteLine($"Main Exit Point thread ID is {Thread.CurrentThread.ManagedThreadId}");
        }
    }
}

When we run this application our output will match the equivalent synchronous code. Stack trace right after File.WriteAllTextAsync() will also be the same as before. We haven’t changed our magic “273” number intentionally to make both of these apps run identical to each other. However, if we decompile this application we will see a world of difference.

To see what’s really happening, we need to disable the “humanization” of async methods that ILSpy does by default. We want to see what’s really being generated by the compiler in every “async” method. In the menu click ViewOptions…, scroll down to “C# 5.0 / VS 2012” section and uncheck “Decompile async methods“.

Decompile async methods disabled
Disable “Decompile async methods”

Let’s take a minute to discuss the “humanization” of asynchronous methods. You may notice that Visual Studio goes out of its way to make sure async methods look just like regular synchronous methods. It shows the stack trace and local variables as if those methods are synchronous. An unsuspecting developer would never know that their application has been rewritten, thread scheduling and memory allocation patterns have changed unless they know where to tweak settings.

Decompiled asynchronous code
Decompiled asynchronous code

Now we have two nested sealed classes – <ExecuteOuterTask>d__0 and <ExecuteNestedTask>d__1. Also, each of our asynchronous methods now has very similar code that has very little to do with the code we wrote. Each method creates an instance of the state machine, starts it, and returns a task. Even though our application produces the same exact output into the console, it looks very different under the hood.

Why such complexity?

It is worth taking a trip down the memory lane to the days of IAsyncResult and looking at BeingInvoke/EndInvoke pattern. This was the way to do asynchronous operations in .NET before tasks. Technically it still exists and there is an interop layer between the two, but it is slowly fading away. Even though the specific interface is no longer mainstream, it captures the essence of what it takes to do interruptable asynchronous programming.

Uninterruptable code

Assume we have a method DoSomething() running on Thread 1 and it needs to make a call that completes asynchronously. For simplicity, let’s also assume that our DoSomething() is written to wait for operation completion blocking the thread. The operation we BeginInvoke() runs on a different thread – Thread 2. It signals to Thread 1 when it has produced the result. Thread 1 wakes up and completes the operation by calling EndInvoke() to obtain the result.

BeginInvoke()/EndInvoke() pattern
BeginInvoke()/EndInvoke() pattern

This is a gross oversimplification of actual thread scheduling and communication pattern. Thread 1 never calls anything on Thread 2 directly. It schedules something to run on Thread 2 by modifying the shared state accessible from Thread 1. Thread 2 listens to that state modification and is woken up to do the work. It places the result in the shared location where Thread 1 can reach it, then signals back to Thread 1 that the job is done. Thread 1 picks up the result by calling EndInvoke().

In this flow of execution Thread 1 is sitting and burning CPU cycles while waiting for Thread 2 to complete work. It is blocked on spinning in Kernel mode checking for a flag to be set. The developer hasn’t implemented DoSomething() in a way that can be interrupted on Thread 1 and resumed, say on Thread 3. The entire runtime context of DoSomething() with all the local variables, shared data locks, and references to memory locations are stored on the thread stack.

Interruptable code

What do we need to change to make our DoSomething() yield execution on the thread to some other piece of code and be resumed elsewhere when Thread 2 is complete? We need to break it into two:

  • DoSomething_1() will contain all the code from the start of the method until BeginInvoke(). It will need to return a reference to the class or structure in memory that reflects whether the entire DoSomething() method is executed in entirety or whether some part is still pending.
  • DoSomething_2() will have the code that runs starting from EndEnvoke() until the end of the original DoSomething() method. This code can run on any thread and therefore cannot return the result directly to the caller. We need to change it to store the result somewhere on the heap where original DoSomething() caller can find it.
Interruptable BeingInvoke()/EndInvoke() pattern
Interruptable BeginInvoke()/EndInvoke() pattern

A few more implications of this change worth pointing out:

  • Thread 2 needs to be able to schedule DoSomething_2(). In the previous example, it only needed to signal that the execution is complete. However, now we don’t have a thread waiting for that signal, so we need to make the Thread 2 schedule “continuation” of the DoSomething() method that we moved into DoSomething_2().
  • Any state that was stored on the stack in the original DoSomething() method must be moved to the heap. We no longer have a single thread to maintain all those local variables so they need to be stored in the shared location where both Thread 1 and Thread 3 can access them.
  • We can no longer rely on thread locks for synchronization of access to shared data structures. If we were to acquire a lock in Thread 1 during DoSomething_1() it would no longer be valid for consumption during DoSomething_2() running on Thread 3.

Do these limitations and implications sound familiar? Behold the state machine.

IAsyncStateMachine

Asynchronous tasks in .NET do pretty much the same things we discussed in the previous chapter, wrapped in a nice little package that has a rich ecosystem of interoperable classes. C# compiler converts all “async” methods into classes deriving from IAsyncStateMachine. All local variables become class members, each chunk of code between two consecutive “await” instructions gets placed into an if-else or switch-case block identified by the number from 0 and up. This number becomes a “state” of a state machine. Each time the block of code ends it constructs a TaskAwaiter and returns it to the caller to know when the next block of code (or the next state) can be invoked. All exceptions are caught and stored in the class members to be referenced at a later point in time, rather than the thread where they actually had occurred.

Async State Machine
AsyncStateMachine representation of ExecuteNestedTask() method

It is easy to recognize the familiar code in the picture above. Content of ExecuteNestedTask() method was broken in two pieces – block under “if (num != 0)” and outside of “else” block. We can clearly see how we are building a StringBuilder iteratively in a while loop. Local variable “i” has become a class member “<i>5__3“. We can also see that the call to File.WriteAllTextAsync() returns an awaiter, that is stored as a class member and scheduled to be executed via AwaitUnsafeOnCompleted() call on AsyncTaskMethodBuilder structure instance. Notice that this call takes both the awaiter instance as well as a state machine. Under the hood, it passes an instance of the state machine MoveNext() action delegate to the awaiter, which will call it when the operating system signals the underlying file write operation.

When we put it all together we start to see how the compiler makes async tasks interruptable – by making MoveNext() run from the current to the next point of interruption. Calling MoveNext() multiple times will allow the state machine to execute each synchronous block between each “await” instruction sequentially, one transition per invocation.

Go async, for real this time

To see the power of this transformation we only need to change our magic number to “274” on line 27 and re-run our application. Notice that suddenly exit points for ExecuteNestedTask() and ExecuteOuterTask() are running on the thread “4“, which is different than the entry points running on the thread “1“. We haven’t changed the code, we only increase the length of the StringBuilder we construct.

Main Entry Point thread ID is 1
ExecuteOuterTask Entry Point thread ID is 1
ExecuteNestedTask Entry Point thread ID is 1
Content Length is 4110
ExecuteNestedTask Exit Point thread ID is 4
ExecuteOuterTask Exit Point thread ID is 4
Main Exit Point thread ID is 1

Now that the code ran truly asynchronously, our stack trace will look very different. PerformWaitCallback() is at the bottom of the stack which is the lowest layer in the thread pool. We also see FlushInternal state machine’s MoveNext() at the bottom of the stack, but ExecuteNestedTask’s MoveNext() is almost at the top of the stack. We see an inversion of the stack relative to the synchronous call experiment.

>TaskTest.dll!TaskTest.Program.ExecuteNestedTask() Line 38	C#
 [Resuming Async Method]	
 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<System.__Canon>.ExecutionContextCallback(object s) Line 580	C#
 System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Line 172	C#
 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<TaskTest.Program.<ExecuteNestedTask>d__1>.MoveNext(System.Threading.Thread threadPoolThread) Line 617	 
 System.Private.CoreLib.dll!System.Threading.Tasks.Task.RunContinuations(object continuationObject) Line 3326	C#
...

 System.Private.CoreLib.dll!System.Threading.Tasks.Task.FinishContinuations() Line 3291	C#
 System.Private.CoreLib.dll!System.Threading.Tasks.Task<System.Threading.Tasks.VoidTaskResult>.TrySetResult(System.Threading.Tasks.VoidTaskResult result) Line 419	C#
 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder.SetResult() Line 273	C#
 [Completed] System.IO.FileSystem.dll!System.IO.File.InternalWriteAllTextAsync(System.IO.StreamWriter sw, string contents, System.Threading.CancellationToken cancellationToken) Line 999	C#
 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<System.IO.File.<InternalWriteAllTextAsync>d__82>.ExecutionContextCallback(object s) Line 580	C#
 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<System.IO.File.<InternalWriteAllTextAsync>d__82>.MoveNext(System.Threading.Thread threadPoolThread) Line 617	C#
 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<System.IO.File.<InternalWriteAllTextAsync>d__82>.MoveNext() Line 595	C#
...
 System.Private.CoreLib.dll!System.Threading.Tasks.Task.FinishContinuations() Line 3291	C#
 System.Private.CoreLib.dll!System.Threading.Tasks.Task<System.Threading.Tasks.VoidTaskResult>.TrySetResult(System.Threading.Tasks.VoidTaskResult result) Line 419	C#
 System.Private.CoreLib.dll!System.IO.StreamWriter.WriteAsyncInternal(System.IO.StreamWriter _this, System.ReadOnlyMemory<char> source, char[] charBuffer, int charPos, int charLen, char[] coreNewLine, bool autoFlush, bool appendNewLine, System.Threading.CancellationToken cancellationToken) Line 868	C#
 [Resuming Async Method]	
 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<System.IO.StreamWriter.<WriteAsyncInternal>d__66>.ExecutionContextCallback(object s) Line 580	C#
...
[Resuming Async Method]	
 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<System.IO.StreamWriter.<FlushAsyncInternal>d__74>.ExecutionContextCallback(object s) Line 580	C#
 System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Line 172	C#
 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<System.IO.StreamWriter.<FlushAsyncInternal>d__74>.MoveNext(System.Threading.Thread threadPoolThread) Line 617	C#
 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<System.IO.StreamWriter.<FlushAsyncInternal>d__74>.MoveNext() Line 595	C#
 System.Private.CoreLib.dll!System.Runtime.CompilerServices.TaskAwaiter.OutputWaitEtwEvents.AnonymousMethod__12_0(System.Action innerContinuation, System.Threading.Tasks.Task innerTask) Line 304	C#
 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncMethodBuilderCore.ContinuationWrapper.Invoke() Line 1131	C#
 System.Private.CoreLib.dll!System.Threading.Tasks.AwaitTaskContinuation.System.Threading.IThreadPoolWorkItem.Execute() Line 646	C#
 System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() Line 677	C#
 System.Private.CoreLib.dll!System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() Line 29	C#
 [Async Call Stack]	
 [Async] TaskTest.dll!TaskTest.Program.ExecuteOuterTask() Line 17	C#

Why changing one number makes the code go truly async?

This question is not strictly relevant to the functionality of Tasks but we should touch on it because it explains the mechanism of task continuation more holistically. The key is in the output of our application next to “Content Length”. To make code go asynchronous we had to cross the threshold of 4096 bytes when writing the file content.

Storage devices are broken down into sectors – the smallest unit of data that is read or written. The concept of a sector originates back in the day when we had spinning disks (HDD). Most of our drives are solid-state or NVMe but the concept of the sector is so important that it still applies in modern-day devices. You can’t write less than a sector, you can’t read less than a sector from the hard drive.

To make IO more performant and avoid unnecessary round-trips to the device, the operating system driver has a buffer that keeps accumulating data until it reaches the size of the sector. Multiple consecutive writes into the file can all be buffered. Once sector size is reached, the driver flushes that buffer to the storage device and allocates another buffer for subsequent writes. You can read more about buffering on MSDN. All writes into the buffer complete synchronously (because they are quick memory copy operations). However, the I/O operation goes asynchronous when the buffer is flushed to the storage device.

Buffered writes into the storage device
Storage driver buffer

The size 4096 is not random either, nor it is guaranteed to be the same across all devices or operating systems. On Windows, there is STORAGE_ACCESS_ALIGNMENT_DESCRIPTOR that developers can request from the OS to know the precise size programmatically. This structure is useful when developers want to bypass buffering and write directly into the storage device. There is a quicker way to tell the size of the sector for your drive in question using command line utilities.

C:\WINDOWS\system32>fsutil fsinfo ntfsinfo c:
NTFS Volume Serial Number :        <redacted>
NTFS Version      :                3.1
LFS Version       :                2.0
Total Sectors     :                1,995,683,917  (951.6 GB)
Total Clusters    :                  249,460,489  (951.6 GB)
Free Clusters     :                   56,101,781  (214.0 GB)
Total Reserved Clusters :                 20,796  ( 81.2 MB)
Reserved For Storage Reserve :                 0  (  0.0 KB)
Bytes Per Sector  :                512
Bytes Per Physical Sector :        512
Bytes Per Cluster :                4096
Bytes Per FileRecord Segment    :  1024
Clusters Per FileRecord Segment :  0
Mft Valid Data Length :            986.00 MB
Mft Start Lcn  :                   0x00000000000c0000
Mft2 Start Lcn :                   0x0000000000000002
Mft Zone Start :                   0x000000000934b2c0
Mft Zone End   :                   0x0000000009357660
MFT Zone Size  :                   195.63 MB
Max Device Trim Extent Count :     1
Max Device Trim Byte Count :       0x1fffe00
Max Volume Trim Extent Count :     1
Max Volume Trim Byte Count :       0x1fff000
Resource Manager Identifier :      <redacted>

For my particular machine, the minimum writable size of the buffer is the “Bytes Per Cluster” value. In the last experiment, the size of the buffer reached 4110 bytes, which resulted in 1 asynchronous write of 4096 bytes and 1 synchronous write into the new buffer of 4110 4096 = 14 bytes.

To confirm that our writer is configured to go asynchronous we only need to look at File.AsyncStreamWriter() implementation that WriteAllTextAsync() calls.

FileStream stream = new FileStream(
                path, append ? FileMode.Append : FileMode.Create, FileAccess.Write, FileShare.Read, DefaultBufferSize,
                FileOptions.Asynchronous | FileOptions.SequentialScan);

It passes FileOptions.Asynchronous flag therefore FileStream would initialize overlapped I/O operations when calling native Win32 API.

All-up execution flow

The last piece left is to put it all together in a single diagram that shows how execution flows between different methods and threads. We will pivot and show methods on the horizontal axis, threads on the vertical axis. Transitions between threads will happen when the call returns all the way to the thread pool.

Thread scheduling and state machine continuation

Main() always runs in a single thread and it is synchronously waiting for the task returned by ExecuteOuterTask() to complete. Waiting is a standard Task spin-lock/wait on event combination as depicted by the stack trace below at the time of waiting. This wait is efficient because it doesn’t burn CPU cycles, but rather just sits and waits on a kernel-mode object to signal.

>System.Private.CoreLib.dll!System.Threading.ManualResetEventSlim.Wait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) Line 626	C#
 System.Private.CoreLib.dll!System.Threading.Tasks.Task.SpinThenBlockingWait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) Line 2922	C#
 System.Private.CoreLib.dll!System.Threading.Tasks.Task.InternalWaitCore(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) Line 2861	C#
 System.Private.CoreLib.dll!System.Threading.Tasks.Task.Wait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) Line 2772	C#
 System.Private.CoreLib.dll!System.Threading.Tasks.Task.Wait() Line 2660	C#
 [Waiting on Async Operation, double-click or press enter to view Async Call Stacks]	
 TaskTest.dll!TaskTest.Program.Main(string[] args) Line 45	C#

Task created by Main() runs synchronously until the call to WriteFileNative(), which is just a PInvoke wrapper for WriteFile() Win32 API. FileStreamCompletionSource registers an overlapped handler with the .NET thread pool, meaning it tells Windows to wake-up a thread out of the thread pool and gives it a specific block of data to process. This block of data contains an identifier of the asynchronous operation that was complete. WriteFile() returns ERROR_IO_PENDING because our content is larger than the buffer size. The call returns all the way to <ExecuteOuterTask> state machine. All of these MoveNext() have performed exactly 1 transition and cannot advance anymore because their awaiters haven’t signaled.

At some point I/O operation completes and Windows does what it was asked to do – wakes up a thread in the thread pool and hands it the context of I/O operation that was completed. Now, this thread is driving all state machines forward. The inner-most state machine performs a MoveNext() in response to the completed IO operation. This, in turn, completes the task of that state machine that was handed to the caller. Caller’s awaiter detects that the task was completed and invokes MoveNext() synchronously. Consequently, the next inner-most state machine completes, which triggers the next, and so on. This chain goes all the way up to <ExecuteOuterTask> state machine, which also completes. Main() is not an asynchronous method so this thread can’t resume it. As such, this chain only completes the outer-most task and returns the call back to the I/O completion thread in the thread pool. Completion of the outer-most task is sufficient to wake-up the main thread which was waiting on the event. The main method continues and completes its course, application exits.

We looked at the storage I/O example, however, the same flow applies to network I/O or any other operation that Windows recognizes and can act on it via Win32 API. Developers can write their own TaskCompletionSource and drive it through the timer (another Windows primitive) or plain and simple loop on a scheduler thread.

Credits

Thanks to dotnetCoreLogoPack for .NET Core logo.