Windows I/O completion - One little trick

I’ve been learning how to deal with I/O Completion ports for my latest project and found a few libraries that manage it all for me but I was getting strange behavior, So I ended up having to dig deep enough to understand what was happening. I didn’t find a really clear post so here is my attempt.

When I was reading some of the code I found all had slightly different ways of accomplishing detection of a completed I/O call. The two libraries I was referencing were Rust’s Mio crate and go’s winio.

Understanding that they were accomplishing the same task in different ways was key:

I/O completion’s one little trick

Those are the two key differences in the way each library approaches doing I/O but I was still confused as to how the program “wakes” back up after the I/O completes.

Let’s take a look at the winio code that returns after the system finished the async call to GetQueuedCompletionStatus. Note that the system call to getQueuedCompletionStatus will suspend the thread that calls it.

// ioOperation represents an outstanding asynchronous Win32 IO.
type ioOperation struct {
	o  syscall.Overlapped
	ch chan ioResult
}

func ioCompletionProcessor(h syscall.Handle) {
	for {
		var bytes uint32
		var key uintptr
		var op *ioOperation
		err := getQueuedCompletionStatus(h, &bytes, &key, &op, syscall.INFINITE)
		if op == nil {
			panic(err)
		}
		op.ch <- ioResult{bytes, err}
	}
}

What is going on here? How does the Operating System call know how to fill in an op *ioOperation and how can we then pass data into the channel?

To figure this out we need to see how I/O is “prepared” and then invoked. To prepare the I/O we create an I/O operation and this is where a channel is created:

func (f *win32File) prepareIO() (*ioOperation, error) {
	f.wgLock.RLock()
	if f.closing.isSet() {
		f.wgLock.RUnlock()
		return nil, ErrFileClosed
	}
	f.wg.Add(1)
	f.wgLock.RUnlock()
	c := &ioOperation{}
	c.ch = make(chan ioResult)
	return c, nil
}

Then we issue the Read passing the reference to the ioOperation and wait for it to complete in ;’asyncIO’. Note that even though this is called asyncIO it is a blocking operation. The thread that gets suspended isn’t this one, it is the one running the go routine with ioCompletionProcessor loop.

	...snip...
	var bytes uint32
	err = syscall.ReadFile(f.handle, b, &bytes, &c.o)
	n, err := f.asyncIO(c, &f.readDeadline, bytes, err)
	runtime.KeepAlive(b)
	...snip...

Inside the `asyncIO we find we are waiting for the channel to be filled:

	...snip...
	var r ioResult
	select {
	case r = <-c.ch:
		err = r.err
		if err == syscall.ERROR_OPERATION_ABORTED { //nolint:errorlint // err is Errno
			if f.closing.isSet() {
				err = ErrFileClosed
			}
		} else if err != nil && f.socket {
			// err is from Win32. Query the overlapped structure to get the winsock error.
			var bytes, flags uint32
			err = wsaGetOverlappedResult(f.handle, &c.o, &bytes, false, &flags)
		}
	case <-timeout:
	    ...snip...
		}
	}
	...snip...

If you read the rest of the code you will not find that channel being used anywhere!

But as you might have guessed by now that channel we saw in the ioCompletionProcessor is the same! How do the two channels get linked together?

The key is a little trick that is used extensively when working with Windows I/O completion ports. When calling getQueuedCompletionStatus we are passing a pointer to the structure Overlapped. The struct we passed look is actually a wrapper:

type ioOperation struct {
	o  syscall.Overlapped
	ch chan ioResult
}

Since we set up the channel during prepareio then passed the pointer to the Read sys call and the OS only fills in the bits for the Overlapped struct when we get the notification that the thread is unsuspended we now have a pointer the the struct that we prepared: ioOperation with a go routine. Then we can pass the value through the channel (which is waiting in the asyncIO function) and the read completes!

This little trick is also used the in the Mio project but slightly differently. Since the Mio project has created an event loop it doesn’t actually wait for the read it just needs to know the event it is associated too (in fact it does copy the buffer internally but that is slightly different than the application doing the reading). The read by the end program will happen at another time. So instead the structure looks a little different but the same trick is used:

#[repr(C)]
pub(crate) struct Overlapped {
    inner: UnsafeCell<OVERLAPPED>,
    pub(crate) callback: fn(&OVERLAPPED_ENTRY, Option<&mut Vec<Event>>),
}

In this case they’ve make it a generic callback function that can be filled with anything.

In other cases you might just have some basic information in and not a call back or channel. It really is up to your use case.

Conclusion

It took me awhile to figure how these calls came together and it was hard to find it explicitly called out anywhere. Hopefully this helps someone who is struggling to figure out the “one small trick” being used here.

I did find eventually find this in a few resources on the topic. I highly recommend reading the following which go over the details of this process in much more detail:

Comments

comments powered by Disqus