WinUI 3.0 and C# Examples

Breaking Away from Clean Code for Performance

June 25th, 2023

Clean code is a set of principles aimed at making code more readable, maintainable, and scalable. It encourages practices like writing small, single-responsibility methods, using abstraction, and leveraging polymorphism. While these principles can make code easier to understand and modify, they can sometimes introduce performance overheads, particularly in CPU-bound applications. This guide will explore these trade-offs and provide insights into when and why you might want to deviate from clean code principles for the sake of performance.

Understanding Clean Code

Clean code principles advocate for code that is easy to read and understand. Small, single-responsibility methods are a cornerstone of this approach. The idea is that each method should do one thing and do it well. This makes the code easier to reason about, test, and reuse.

Polymorphism is another principle often used in clean code. It allows us to write flexible code that can work with objects of different types, as long as they adhere to the same interface or base class. This can make the code more modular and easier to extend.

However, these principles come with a cost.

The Cost of Clean Code

In CPU-bound applications, where the speed of the CPU is the limiting factor, these principles can introduce overheads that affect performance.

Polymorphism, for example, involves a runtime dispatch mechanism to decide which method to call. This introduces a performance overhead compared to static or compile-time method calls.

public interface IShape
{
	double CalculateArea();
}

public class Circle : IShape
{
	public double Radius { get; set; }
	public double CalculateArea() => Math.PI * Math.Pow(Radius, 2);
}

public class Square : IShape
{
	public double SideLength { get; set; }
	public double CalculateArea() => Math.Pow(SideLength, 2);
}

// Using polymorphism
public double CalculateTotalArea(IShape[] shapes)
{
	double totalArea = 0;
	foreach (var shape in shapes)
	{
		totalArea += shape.CalculateArea(); // Dynamic dispatch
	}
	return totalArea;
}

Dynamic dispatch, the mechanism behind polymorphism, involves a lookup operation at runtime to determine the specific method to call. This lookup operation, while typically fast, still takes time and can add up if performed frequently, especially in a tight loop. This overhead is absent in static or compile-time function calls, where the specific method to call is determined when the program is compiled.

Small, single-responsibility methods also have a cost. Method calls themselves have an overhead. When a method is called, parameters and return addresses need to be pushed onto the system stack, and when the method returns, they need to be popped off. If a program has many small methods, this overhead will add up.

public double CalculateCircleArea(double radius)
{
	return Math.PI * Math.Pow(radius, 2); // Method call overhead
}

public double CalculateSquareArea(double sideLength)
{
	return Math.Pow(sideLength, 2); // Method call overhead
}

public double CalculateTotalArea(double[] radii, double[] sideLengths)
{
	double totalArea = 0;
	foreach (var radius in radii)
	{
		totalArea += CalculateCircleArea(radius); // Method call overhead
	}
	foreach (var sideLength in sideLengths)
	{
		totalArea += CalculateSquareArea(sideLength); // Method call overhead
	}
	return totalArea;
}

The overhead from small, single-responsibility functions comes from the function call mechanism itself. When a function is called, the system needs to do some work behind the scenes. It needs to push the function parameters and the return address onto the system stack, and when the function returns, it needs to pop them off. This push and pop operation takes time and can add up if a function is called frequently, especially in a tight loop. Additionally, each function call introduces a new level of scope, which can lead to less efficient use of the CPU cache, further impacting performance. However, modern compilers are often able to inline small functions, which sometimes alleviates the problem.

Re-Cycling

In the context of CPUs, a "cycle" refers to a single tick of the CPU's clock, which is the basic unit of time that a CPU understands. The CPU performs its operations in sync with this clock. Each tick of the clock is one cycle, and different operations take different numbers of cycles to complete.

The number of cycles that an operation takes is known as its "cycle cost". This cost is determined by the complexity of the operation. For example, a simple operation like adding two numbers might take only a few cycles, while a more complex operation like a floating-point division might take many more cycles.

When we talk about the "cycle cost" of a function call or a dynamic dispatch operation, we're talking about the number of cycles that the CPU needs to spend to perform these operations. This includes the cycles needed to push parameters onto the stack, jump to the function's code, execute the function, and then return from the function. For dynamic dispatch, it also includes the cycles needed to look up the correct function to call at runtime.

The cycle cost of these operations is not fixed and can vary depending on various factors, such as the specific CPU architecture, the compiler, and the specifics of the code. However, in general, these operations are not free and can introduce a noticeable overhead if performed frequently. This is why they can impact the performance of CPU-bound applications.

The exact cycle costs of operations like function calls or dynamic dispatch can vary widely depending on a number of factors, including the specific CPU architecture, the compiler, and the specifics of the code. However, I can provide some general estimates:

Function Call: The cost of a function call can include several components:

Parameter Passing: Parameters must be pushed onto the stack or placed in registers, which typically takes 1 cycle per parameter.
Jumping to the Function: The CPU must jump to the location in memory where the function's code is stored. This can take several cycles, depending on the CPU's instruction pipeline and whether the function's code is already in the CPU's cache.
Returning from the Function: The CPU must jump back to the location in the code where the function was called. This also takes several cycles.

Dynamic Dispatch: The cost of dynamic dispatch can also include several components:

Lookup: The CPU must look up the correct function to call in the object's vtable (a table of function pointers used to implement dynamic dispatch). This can take several cycles, depending on the size of the vtable and whether it's already in the CPU's cache.
Indirect Function Call: After the correct function has been looked up, the CPU must perform an indirect function call. This is similar to a regular function call, but can be slower because the CPU can't predict the target of the call as easily, which can lead to pipeline stalls.

When to Break Away from Clean Code

The overheads introduced by clean code principles can be negligible in many cases. However, in CPU-bound applications where every cycle counts, these overheads can become significant. In such cases, it might be beneficial to break away from clean code principles for the sake of performance.

For example, you might decide to avoid polymorphism and instead use a switch statement or a series of if-else statements to handle different types of objects. This can eliminate the overhead of dynamic dispatch.

Similarly, you might decide to combine several small methods into a larger one to reduce the method call overhead.

public double CalculateTotalArea(double[] radii, double[] sideLengths)
{
	double totalArea = 0;
	foreach (var radius in radii)
	{
		totalArea += Math.PI * Math.Pow(radius, 2); // Inlined method
	}
	foreach (var sideLength in sideLengths)
	{
		totalArea += Math.Pow(sideLength, 2); // Inlined method
	}
	return totalArea;
}

Balancing Performance and Clean Code

When deciding whether to follow the clean code principles or not, it's important to consider the trade-offs involved.

In many cases, the potential performance gains might not be worth the loss in code readability and maintainability, especially if the code must be accessed by multiple people. It's also important to remember that not all performance problems are due to the CPU. In many cases, the bottleneck might be elsewhere, such as in disk I/O, network latency, or database queries. In such cases, optimizing CPU usage might not lead to a noticeable improvement in overall performance.

However, in CPU-bound applications where every cycle counts, the performance gains from avoiding clean code principles might be significant. In such cases, it might be worth considering a more performance-oriented approach.

Remember, the goal is not to write "unclean" code, but rather to write code that is both efficient and as clean as possible. This might involve making some trade-offs, but with careful consideration and testing, it's possible to write code that is both fast and maintainable.