Creating an analyzer to detect infinite loops caused by ThreadAbortExceptions

In this post I describe an infinite-loop scenario that can occur on .NET Framework when a ThreadAbortException is raised. I describe when you might run into this scenario, why it happens (it's a bug in the runtime), and how you can avoid it. Finally I show a Roslyn Analyzer that you can use to automatically flag problematic code.

Throwing a `ThreadAbortException` with `Thread.Abort()`

When you're doing parallel/concurrent programming in .NET, and you want to do two things at once, you typically use the Task Parallel Library, Task, Task<T>, async/await, and all that modern goodness. However, you can also manage threads yourself "manually", by calling Thread.Start() etc.

These days, in practice, you should almost never be working directly with threads. Use Task et al wherever possible so that you're using the ThreadPool to schedule jobs and async/await to handle continuations.

If you have a running thread, and you want to stop it running, you would typically try to use cooperative cancellation, using CancellationTokens or something similar. However, in some cases that's not possible; maybe the thread is running third party code out of your control, for example. In .NET Framework you have a "kill it with fire" option: Thread.Abort().

Note that Thread.Abort() only applies to .NET Framework. The Abort() method is not supported on .NET Core and throws a PlatformNotSupportedException to the caller instead.

Calling Abort() on a thread causes the runtime to throw a ThreadAbortException in the thread's code. ThreadAbortException is special, in that you can catch it in application code (unlike some other exceptions such as StackOverflowException which can't be caught), but the runtime automatically re-throws the ThreadAbortException at the end of the catch block.

It is possible to "cancel" the exception by calling ResetAbort() but I'm not going to go into that in this post.

Just to give a concrete example, the following is a small .NET Framework program that starts a Thread, which starts doing some work, and then calls Abort().

// Start a new thread, which runs the DoWork method
var myThread = new Thread(new ThreadStart(DoWork));
myThread.Start();

Thread.Sleep(300);
Console.WriteLine("Main - aborting thread");

myThread.Abort(); // Trigger a ThreadAbortException
myThread.Join(); // Wait for the thread to exit

Console.WriteLine("Main ending");

static void DoWork()
{
    try
    {
        for (var i = 0; i < 100; i++)
        {
            Console.WriteLine($"Thread - working {i}");
            Thread.Sleep(100);
        }
    }
    catch (ThreadAbortException e)
    {
        Console.WriteLine($"Thread - caught ThreadAbortException: {e.Message}");
        // Even though we caught the exception, the runtime re-throws it
    }

    // This is never called
    Console.WriteLine("Thread - outside the catch block");
}

When you run the program, the output looks something like this:

Thread - working 0
Thread - working 1
Thread - working 2
Main - aborting thread
Thread - caught ThreadAbortException: Thread was being aborted.
Main ending

As you can see, even though we caught the ThreadAbortException, the thread exited, as the exception was re-thrown. Now we'll look at a scenario where that doesn't quite work as you expect.

Infinite loops and `ThreadAbortException`

The issue I'm going to describe is based on a real issue we ran into in the Datadog .NET Tracer shortly before I joined in January 2021. The issue occurred during IIS AppDomain recycles (among other cases) and would result in the apps not shutting down. As you might expect given the preamble, the problem was related to ThreadAbortException.

We can demonstrate the problem easily if we make a slight tweak to the example above. Instead of using a for loop inside a try-catch, we're going to change to a try-catch inside a while loop. The rest of the program remains the same, so I've only shown the DoWork() method:

static void DoWork()
{
    var i = 0;
    while (true)
    {
        try
        {
            Console.WriteLine($"Thread - working {i}");
            i++;
            Thread.Sleep(100);
        }
        catch (ThreadAbortException e)
        {
            Console.WriteLine($"Thread - caught ThreadAbortException {e.Message}");
            // Even though we caught the exception, the runtime _should_ re-throw it
        }
    }

    // This is never called
    Console.WriteLine("Thread - outside the catch block.");
}

Now, theoretically, there should be no difference here. The Abort() is called, caught in the catch block, and the runtime should-rethrow the exception, exiting the while loop and the thread. However if we run the app in the Release configuration we have a problem—we get stuck in an infinite loop in the catch block:

Thread - working 0
Thread - working 1
Thread - working 2
Main - aborting thread.
Thread - caught ThreadAbortException Thread was being aborted.
Thread - caught ThreadAbortException Thread was being aborted.
Thread - caught ThreadAbortException Thread was being aborted.
Thread - caught ThreadAbortException Thread was being aborted.
Thread - caught ThreadAbortException Thread was being aborted.
Thread - caught ThreadAbortException Thread was being aborted.
Thread - caught ThreadAbortException Thread was being aborted.
Thread - caught ThreadAbortException Thread was being aborted.
Thread - caught ThreadAbortException Thread was being aborted.
Thread - caught ThreadAbortException Thread was being aborted.
...

This is clearly Not Good™, and ultimately comes down to a bug in the JIT. The explanation of the bug is somewhat complex (and is largely due to a workardound for a different bug) but this comment has all the gory details if you want to dig in.

The bug is present in the RyuJIT compiler, but not in the legacy JIT, so you can also workaround the bug by setting <useLegacyJit enabled="1" /> in your app.config or web.config.

The bug is triggered specifically when you have a "tight" loop with a try-catch directly inside a while loop:

while(true)
{
    try
    {
        // ...
    }
    catch
    {
        // ...
    }
}

Adding a Console.WriteLine() (for example) inside the while loop but outside the try-catch causes the bug to be avoided, as does using a for loop for example, so it's this specific pattern you need to watch out for. Adding a finally block also fixes the issue.

Ultimately, Microsoft was decided not to fix this bug, so the workaround is to ensure you always "manually" re-throw a ThreadAbortException if you find yourself with the problematic pattern.

Unfortunately, it's not obvious that the pattern is problematic just by looking at it, so it's a great candidate for a Roslyn Analyzer to do the spotting for you.

Creating an analyzer to detect the pattern

In this section I show the Roslyn Analyzer I wrote to make sure we don't accidentally introduce this code into the Datadog library.

If you're building a .NET Core-only application then you don't need to worry about this, because .NET Core doesn't support ThreadAbortExceptions. However, if you're building a library that multi-targets .NET Core and .NET Framework, or uses netstandard2.0 to do so, then you might want to consider using it.

I'm not going into detail about how to create a analyzer in this post (I covered this some time ago in a previous post). Instead I'm just going to focus on the analyzer code itself.

As a reminder, we are trying to detect code that looks something like this:

while(...)
{
    try
    {
        // ...
    }
    catch
    {
        // ...
    }
}

and advise you to update it to manually re-throw the exception. The simplest fix might look like this:

while(...)
{
    try
    {
        // ...
    }
    catch
    {
        // ...
        throw; // Required to avoid infinite recursion
    }
}

We'll add a code fix provider to automatically make that basic fix later.

Creating the analyzer

We'll start by looking at the analyzer itself. This derives from DiagnosticAnalyzer, defines a diagnostic ID, and registers a SyntaxNodeAction that looks for while loops. If the while loop contains a try-catch that has a problematic catch clause, we raise the issue.

[DiagnosticAnalyzer(LanguageNames.CSharp)]
public class ThreadAbortAnalyzer : DiagnosticAnalyzer
{
    public const string DiagnosticId = "ABRT0001";

    private static readonly DiagnosticDescriptor Rule = new(
        DiagnosticId,
        title: "Potential infinite loop on ThreadAbortException",
        messageFormat: "Potential infinite loop - you should rethrow Exception in catch block",
        category: "Reliability",
        defaultSeverity: DiagnosticSeverity.Error,
        isEnabledByDefault: true,
        description: "While blocks are vulnerable to infinite loop on ThreadAbortException due to a bug in the runtime. The catch block should rethrow a ThreadAbortException, or use a finally block");

    public override ImmutableArray<DiagnosticDescriptor> SupportedDiagnostics { get; } = ImmutableArray.Create(Rule);

    public override void Initialize(AnalysisContext context)
    {
        // Don't bother checking generated code
        context.ConfigureGeneratedCodeAnalysis(GeneratedCodeAnalysisFlags.None);
        context.EnableConcurrentExecution();

        context.RegisterSyntaxNodeAction(AnalyseSyntax, SyntaxKind.WhileStatement);
    }

    private void AnalyseSyntax(SyntaxNodeAnalysisContext context)
    {
        if (context.Node is WhileStatementSyntax whileStatement
            && ThreadAbortSyntaxHelper.FindProblematicCatchClause( // shown below
                whileStatement, context.SemanticModel) is { } problematicCatch)
        {
            // If we're in a while statement, and there's a problematic catch
            // clause, then create a diagnostic
            var diagnostic = Diagnostic.Create(Rule, problematicCatch.GetLocation());
            context.ReportDiagnostic(diagnostic);
        }
    }
}

The ThreadAbortSyntaxHelper performs the analysis of the while block, looking explicitly for a while block with the following characteristics:

The body of the while is a BlockSyntax
The body contains only one statement, which is a TryStatementSyntax
The TryStatementSyntax contains a CatchClauseSyntax which catches a ThreadAbortException (or its ancestors)
The CatchClauseSyntax does not call throw;

If all of these conditions are matched, the analyzer flags the catch as problematic. The code of the helper is shown below:

internal static class ThreadAbortSyntaxHelper
{
    public static CatchClauseSyntax FindProblematicCatchClause(WhileStatementSyntax whileStatement, SemanticModel model)
    {
        if (whileStatement.Statement is not BlockSyntax blockSyntax)
        {
            return null;
        }

        var innerStatements = blockSyntax.Statements;
        if (innerStatements.Count != 1)
        {
            // only applies when try directly nested under while and only child
            return null;
        }

        if (innerStatements[0] is not TryStatementSyntax tryCatchStatement)
        {
            // Not a try catch nested in a while
            return null;
        }

        CatchClauseSyntax catchClause = null;
        var willCatchThreadAbort = false;
        var willRethrowThreadAbort = false;

        foreach (var catchSyntax in tryCatchStatement.Catches)
        {
            catchClause = catchSyntax;
            var exceptionTypeSyntax = catchSyntax.Declaration.Type;
            if (CanCatchThreadAbort(exceptionTypeSyntax, model))
            {
                willCatchThreadAbort = true;

                // We're in the catch block that will catch the ThreadAbort
                // Make sure that we re-throw the exception
                // This is a very basic check, in that it doesn't check control flow etc
                // It requires that you have a throw; in the catch block
                willRethrowThreadAbort = catchSyntax.Block.Statements
                    .OfType<ThrowStatementSyntax>()
                    .Any();
                break;
            }
        }

        if (willCatchThreadAbort && !willRethrowThreadAbort)
        {
            return catchClause;
        }

        return null;
    }

    private static bool CanCatchThreadAbort(TypeSyntax syntax, SemanticModel model)
    {
        var exceptionType = model.GetSymbolInfo(syntax).Symbol as INamedTypeSymbol;
        var exceptionTypeName = exceptionType?.ToString();
        return exceptionTypeName == typeof(ThreadAbortException).FullName
            || exceptionTypeName == typeof(SystemException).FullName
            || exceptionTypeName == typeof(Exception).FullName;
    }
}

There are clearly a bunch of limitations to this analysis, but I'll go through those later. When you run the analyzer, you can see that it works, flagging the exception in a problematic scenario:

The analyzer in action

Now that we have the analyzer, let's create a simple code fix provider for it

Creating the code fix provider

The CodeFixProvider is registered as a fixer for the ThreadAbortAnalyzer we defined above. It takes the diagnostic location provided and registers a code fix which simply adds a throw statement to the end of the first catch block that would catch the ThreadAbortException.

[ExportCodeFixProvider(LanguageNames.CSharp, Name = nameof(ThreadAbortCodeFixProvider))]
[Shared]
public class ThreadAbortCodeFixProvider : CodeFixProvider
{
    public sealed override ImmutableArray<string> FixableDiagnosticIds => ImmutableArray.Create(ThreadAbortAnalyzer.DiagnosticId);
    public sealed override FixAllProvider GetFixAllProvider() => WellKnownFixAllProviders.BatchFixer;

    public sealed override async Task RegisterCodeFixesAsync(CodeFixContext context)
    {
        var root = await context.Document.GetSyntaxRootAsync(context.CancellationToken).ConfigureAwait(false);

        var diagnostic = context.Diagnostics.First();
        var diagnosticSpan = diagnostic.Location.SourceSpan;

        // Find the catch block catch declaration identified by the diagnostic.
        var catchClause = root.FindToken(diagnosticSpan.Start)
            .Parent
            .AncestorsAndSelf()
            .OfType<CatchClauseSyntax>().First();

        // Register a code action that will invoke the fix.
        context.RegisterCodeFix(
            CodeAction.Create(
                title: "Rethrow exception",
                createChangedDocument: c => AddThrowStatement(context.Document, catchClause, c),
                equivalenceKey: nameof(ThreadAbortCodeFixProvider)),
            diagnostic);
    }

    private static async Task<Document> AddThrowStatement(Document document, CatchClauseSyntax catchBlock, CancellationToken cancellationToken)
    {
        // This messes with the whitespace, but meh, it's simple
        var throwStatement = SyntaxFactory.ThrowStatement();
        var statements = catchBlock.Block.Statements.Add(throwStatement);
        var newCatchBlock = catchBlock.Block.WithStatements(statements);

        // replace the syntax and return updated document
        var root = await document.GetSyntaxRootAsync(cancellationToken).ConfigureAwait(false);
        root = root.ReplaceNode(catchBlock.Block, newCatchBlock);
        return document.WithSyntaxRoot(root);
    }
}

Now when the analyzer flags an issue, you get a suggestion of how to fix it with one click:

The code fix suggestion in action

This is clearly a crude fix (as I describe in the next section) but I've not found it to be a big issue in practice, the important thing is that it draws attention to the issue and shows a possible fix.

Limitations of the analyzer and the code fix

The analyzer I show in this post is not particularly sophisticated. It does only very basic analysis of the while and try-catch statements. The limitations include:

Assumes an infinite while loop. For simplicity, the analyzer doesn't check the expression in the while loop, and assumes it will loop infinitely. That's a conservative approach, and will flag some cases that won't trigger the bug, but it's good enough for our purposes.
Exception filters are not considered. For simplicity, I've ignored exception filters on the catch block. That means we might assume an exception is caught when it is not, and in that case we might also incorrectly assume an exception is rethrown when it is not.
Doesn't consider finally blocks. In practice, the presence of a finally block can avoid the bug, so doesn't need to explicitly rethrow. The analyzer does not consider this, and take a more conservative approach, requiring the rethrow.
Doesn't check flow control in catch clause. In some cases, a catch clause might be calling throw;, but if it's not a direct child of the catch block, the analyzer will ignore it. Again, this is a conservative approach.

In terms of the code fix provider, it's potentially unlikely that you would actually want to call throw; inside a catch(Exception) block. A better approach would likely be to introduce an additional catch clause for ThreadAbortException specifically, and only re-throwing in that clause.

For example, if you have this:

while(true)
{
    try
    {
        Console.WriteLine("Looping")
        Thread.Sleep(100);
    }
    catch(Exception)
    {
        Console.WriteLine("Exception!")
    }
}

then instead of the code suggested by the analyzer:

while(true)
{
    try
    {
        Console.WriteLine("Looping")
        Thread.Sleep(100);
    }
    catch(Exception)
    {
        Console.WriteLine("Exception!")
        throw; // Added by code fix provider
    }
}

you might want to do something like this instead:

while(true)
{
    try
    {
        Console.WriteLine("Looping")
        Thread.Sleep(100);
    }
    catch(ThreadAbortException) // catch ThreadAbortException explicitly
    {
        Console.WriteLine("ThreadAbortException!")
        throw; // Avoid the bug
    }
    catch(Exception)
    {
        Console.WriteLine("Exception!")
        // No need to throw in this block
    }
}

This avoids the bug by re-throwing when you have a ThreadAbortException specifically, and means you don't rethrow for just any Exception. In practice, I wasn't going to bother writing a code fix provider at all, so I went for the simplest solution at the time. If I wanted to be more robust I would almost certainly try to use this pattern instead.

Summary

In this post I described a bug in the .NET Framework runtime that can cause a ThreadAbortException to get stuck in an infinite loop. The bug only occurs when you have a try-catch block tightly nested in a while block. Normally if you catch a ThreadAbortException the runtime automatically re-throws the exception after the catch block has executed. However the bug means that the catch block gets stuck re-executing infinitely.

In the second half of the post I showed a Roslyn Analyzer I created that can detect the problematic pattern and includes a code fix provider that adds a throw; statement to break out of the infinite loop. It's a relatively crude analyzer, but I know it's saved us at least once from introducing the issue!

Andrew Lock | .NET Escapades Andrew Lock