DirectX Paint and WinUI 3.0

Drawing Instances

June 28th, 2023

This time, let's get into making the brush strokes work. The way they are handled in programs like Clip Studio Paint, Krita, and Photoshop, as the user drags with the mouse, the brush bitmaps are stamped down in a set interval, such as 10 units. Increasing the distance of the stamps allows for a smooth drawing experience with more resource intensive brushes and also cool effects with others. The point is, we don't want to draw a bitmap every single frame the mouse pointer is down. We want to check if it has moved a certain distance.

private float minDistance = 80;
private List<BrushStamp> brushStamps = new List<BrushStamp>();
struct BrushStamp
{
	public Point Position;
}

So, let's set the minimum distance to something large for now, so that we can observe the app working as intended. Let's also define a new struct for the brush stamps. For now, it'll only have the position of the stamp, but in future it could be extended to accommodate individual color (per stamp), size, rotation, or pressure, if we implement tablet functionality. Also, List<BrushStamp> is better for code readability than List<Point>.

Now, a sidebar about the use of Point instead of Vector2. Direct3D11 does deal with types like Vector2 and Point is a proprietary type in WinUI (or more accurately WinRT). We can't get away from this type conversion because we are using WinUI. At some point, Point is going to have to turn to Vector2. I decided to do that later, because I want to keep the UI code relating more to the WinUI stuff. But this means that since we're using the BrushStamp struct in the UI, I either have to set it to use WinUI's Point type or do a bunch of type conversions in the UI code. I decided to go with Point, because I only need this app to work on WinUI, but feel free to arrive to your own conclusion.

To avoid having to write open the namespace between System.Drawing (int) and Windows.Foundation (float), let's just add this line at the top of the file.

using Point = Windows.Foundation.Point;

To get the mouse events read correctly, in addition to PointerPressed event, we need to add PointerMoved and PointerReleased events.

private bool isDrawing = false;

private void SwapChainCanvas_PointerPressed(object sender, PointerRoutedEventArgs e)
{
	isDrawing = true;
}


private void SwapChainCanvas_PointerMoved(object sender, PointerRoutedEventArgs e)
{
	if (isDrawing)
	{
	}
}

private void SwapChainCanvas_PointerReleased(object sender, PointerRoutedEventArgs e)
{
	isDrawing = false;
}

The idea here is that pressing the mouse button down sets drawing on and releasing it sets drawing off. And while the drawing is on, in PointerMoved we can do freaky things!

if (isDrawing)
{
	Point currentPos = e.GetCurrentPoint(SwapChainCanvas).Position;
	if (brushStamps.Count == 0 || Distance(brushStamps.Last().Position, currentPos) >= minDistance)
	{
		brushStamps.Add(new BrushStamp { Position = currentPos });
	}
}

We get the mouse cursor coordinates from the event arguments. Then we go a couple of checks. If the list of stamps is empty, we can go straight ahead and add a new stamp. If it's not empty, we check the distance between the last stamp's position and the mouse cursor's position. If the distance exceeds our set limit, we add another stamp to the list.

To check the distance, we'll need a little helper method that takes two points and returns the difference.

private double Distance(Point a, Point b)
{
	double dx = a.X - b.X;
	double dy = a.Y - b.Y;
	return Math.Sqrt(dx * dx + dy * dy);
}

Update

Now we're ready to fix the Update method. I'm going to add a check and pick the position of the last stamp to draw.

if (brushStamps.Count != 0)
{
	data.ClickPosition = ConvertMousePointTo3D(brushStamps.Last().Position);
}

So now, if you drag with your mouse on the canvas, you can see the stamp updating when it exceeds the set distance.

Stamp Instancing

When developing our drawing application, we aim to ensure efficient rendering of brush strokes. In the interest of this goal, we utilize a powerful feature in DirectX called Instanced Rendering. This technique allows us to draw multiple instances of the same geometry with varying attributes. In our case, we draw several instances of a quad, each representing a single brush stroke, but each instance occupies a different position on the canvas.

We begin by defining our geometry: a single "template" quad. We specify the vertices of this quad, complete with texture coordinates, and the indices that determine its triangles.

Next, we compile a list of brush strokes, each carrying a unique positional attribute. This list, when uploaded to the GPU, serves as an instance buffer. It's essentially a storehouse of per-instance data, each data set corresponding to a single brush stroke's position.

When we call the DrawIndexedInstanced method, our template quad gets instanced for each entry in our instance buffer. For every instance, the GPU executes the vertex shader once for each vertex, treating the positional data of each corresponding brush stroke as the PerInstanceData input. This results in the drawing of multiple textured quads, each positioned as per the attributes in the instance buffer.

Interestingly, the GPU handles the looping over vertex and instance data internally. It understands the number of instances it needs to draw based on the DrawIndexedInstanced command and automatically consumes the corresponding piece of per-instance data for each instance. This clever functionality means that for every brush stroke drawn, it uses a fresh set of instance data.

So, let's picture it: for every brush stroke, the vertex shader executes four times (given our quad has four vertices). And each time, it takes the brush stroke's positional data as the PerInstanceData input. This mechanism enables us to draw a multitude of quads using just one draw call, dramatically boosting efficiency.

Here's what we'll do...

[StructLayout(LayoutKind.Sequential, Pack = 16, Size = 32)]
public struct InstanceData
{   
	public Vector2 Position;
	public Vector2 Scale;
}

Let's defines a struct called InstanceData with two fields: Position and Scale, both of type Vector2. Also, let's add [StructLayout(LayoutKind.Sequential, Pack = 16, Size = 32)], which specifies the layout and memory packing of the struct. The struct should be laid out sequentially in memory, with a packing size of 16 bytes, and a total size of 32 bytes.

InputElementDescription[] inputElements = new InputElementDescription[]
{
	new InputElementDescription("POSITION", 0, Format.R32G32B32_Float, 0, 0, InputClassification.PerVertexData, 0),
	new InputElementDescription("TEXCOORD", 0, Format.R32G32_Float, 12, 0, InputClassification.PerVertexData, 0),
	new InputElementDescription("POSITION", 1, Format.R32G32_Float, 0, 1, InputClassification.PerInstanceData, 1),
	new InputElementDescription("TEXCOORD", 1, Format.R32G32_Float, 8, 1, InputClassification.PerInstanceData, 1)
};

Here, we define the extra element descriptions for the instance data. Position data uses the name "POSITION". It has a format of R32G32_Float, meaning it contains two 32-bit floating-point numbers for the X and Y coordinates. The size of this element is 8 bytes. Similarly, the instance texture coordinates use the name "TEXCOORD". It also has a format of R32G32_Float, representing two floating-point numbers. Its size is also 8 bytes.

Let's now initialize our brushStamps list on the CPU and the instance buffer on the GPU. While the user is interacting with our application, new brush stamps will be generated as they move their cursor. These new instances are added to our CPU-side list as they're created.

To emphasis, these new instances are not sent to the GPU one by one. Instead, we send them in batches to minimize CPU-GPU communication, optimizing our application's performance. The batch size needs to be decided based on the specifics of your application and hardware. For instance, this might be as low as 10 or 20 instances.

Within each frame, we have a straightforward routine. If there are new instances in the CPU-side list (which is typically the case while the user is actively drawing), we transfer these to the GPU-side buffer, regardless of whether the CPU-side list has reached its capacity. The GPU then draws whatever is in its buffer at that time.

If the CPU-side list reaches its capacity - let's say this is 20 instances for our application - we transfer this batch of instances to the GPU, instruct the GPU to draw these instances with a DrawIndexedInstanced call, and then clear the CPU-side list.

In the event that the CPU-side list does not reach its capacity within a frame - perhaps when the user is drawing slowly - we still need the GPU to draw these instances. So, at the end of each frame, we transfer whatever is in the CPU-side list to the GPU, issue a DrawIndexedInstanced call, and then clear the CPU-side list.

Finally, when the user completes their stroke, we clear both the CPU-side list and the GPU-side buffer to prepare for the next set of brush instances. This is how we utilize dynamic batching to reduce draw calls and CPU-GPU transfers while handling an arbitrary number of instances without resizing the GPU buffer.

private BufferDescription instanceBufferDescription;
private ID3D11Buffer instanceBuffer;

So, let's do a variable called maxInstances that stores the maximum number of instances.

int maxInstances = 20;

instanceBufferDescription = new BufferDescription() 
{
	ByteWidth = maxInstances * Marshal.SizeOf<InstanceData>(),
	Usage = ResourceUsage.Dynamic,
	BindFlags = BindFlags.VertexBuffer,
	CPUAccessFlags = CpuAccessFlags.Write,
	StructureByteStride = Marshal.SizeOf<InstanceData>()
};
instanceBuffer = device.CreateBuffer(instanceBufferDescription);

Then, we create a description for our buffer. It will be used as a vertex buffer for rendering objects and it will be dynamically updated, allowing us to write data to it from the CPU. The buffer will store information about each instance, and its size is calculated based on the maxInstances value. Finally, we create the buffer using the description and assign it to the instanceBuffer variable.

After that, we can go to the SetRenderState method and initialize a few things. I'm moving the setup of strides and offset here.

int vertexStride = Marshal.SizeOf<Vertex>();
int instanceStride = Marshal.SizeOf<InstanceData>();
int offset = 0; 
deviceContext.IASetVertexBuffers(0, new[] { vertexBuffer, instanceBuffer }, new[] { vertexStride, instanceStride }, new[] { offset, offset });

The "stride" variables determine the size of the data in our vertex buffer and instance buffer. It's measuring the length of each piece of information. The "offset" variable helps us specify where in the buffer we want to start using the data.

Finally, the "deviceContext.IASetVertexBuffers" is changed to set both the vertexBuffer and the instanceBuffer for deviceContext.

Now, we can loop through the brush stamps in our list and create an "instanceData" object that holds information about the position and scale of each brush stamp.

if (brushStamps.Count > 0)
{   
	MappedSubresource mappedResource = deviceContext.Map(instanceBuffer, 0, MapMode.WriteDiscard, Vortice.Direct3D11.MapFlags.None);
	IntPtr dataPtr = mappedResource.DataPointer;
	foreach (BrushStamp stamp in brushStamps)
	{   
		float scale = 100.0f;
		Vector2 pos = ConvertMousePointTo3D(stamp.Position);
		Vector2 scaleVector = new Vector2(scale, scale);
		InstanceData instanceData = new InstanceData { Position = pos, Scale = scaleVector };
		Marshal.StructureToPtr(instanceData, dataPtr, true);
		dataPtr += Marshal.SizeOf<InstanceData>();
	}   
	deviceContext.Unmap(instanceBuffer, 0); 
}

Then, using "Marshal.StructureToPtr," we convert the "instanceData" object into a memory pointer and store it at the location pointed by "dataPtr." It's like putting the data in a specific spot in the computer's memory. After that, we increment the "dataPtr" by the size of the "InstanceData" structure, moving the pointer to the next available spot in memory. By doing these steps inside the loop, we can process each brush stamp and store its corresponding instance data in memory for further use.

In the Draw method, let's swap the DrawIndexed to DrawIndexedInstanced to accommodate our instancing needs.

deviceContext.DrawIndexedInstanced(indicesArray.Length, brushStamps.Count, 0, 0, 0);

The "deviceContext.DrawIndexedInstanced" tells the graphics device to draw the instances on the screen. It specifies the number of indices to use, the number of instances to draw, and some additional location data for vertices, indices, and instances.

Now, we can change the vertex shader to utilize the incoming instance data.

hlsl

struct VertexInput
{
	float3 position : POSITION0;
	float2 texCoord : TEXCOORD0;
	float2 instancePos : POSITION1;
	float2 instanceScale : TEXCOORD1;
};
...
VertexOutput VS(VertexInput input)
{
	VertexOutput output;
	float scaledX = input.position.x * input.instanceScale.x;
	float scaledY = input.position.y * input.instanceScale.y;
	float scaledZ = input.position.z;
	float3 scaledPos = float3(scaledX, scaledY, scaledZ);
	
	float3 instPos = float3(input.instancePos.x, input.instancePos.y, 0.0);
	float3 worldPos = scaledPos + instPos;
	output.position = mul(float4(worldPos, 1.0), WorldViewProjection);
	output.texCoord = input.texCoord;
	return output;
}

VertexInput struct needs to be changed to match the input elements.

In the main function, we scale the position based on the instance scale, combine it with the instance position, and create a final world position. Then, we transform this world position by a WorldViewProjection matrix to get the final position of the vertex on the screen. We also pass along the texture coordinates from the input to be utilized by the pixel shader.

Since we're explicitly scaling the instances now, we can set the desired world size to match the SwapChainCanvas (500*500) in InitializeDirectX.

desiredWorldWidth = (float)SwapChainCanvas.Width;
desiredWorldHeight = (float)SwapChainCanvas.Height;

Visual Studio project:
d3dpaint-pt2.zip (30 KB)
D3DPaint part 2 in GitHub