Till: Part 1

Writing a customisable Rust Async Executor

November 10 2023

I’ve been reading some of Without Boats’ recent blog posts about the state of async programming in Rust and it’s become a sort of motivator for me to step outside my comfort zone when it comes to programming in async Rust. I write with async Rust daily at work using the Actix-Web HTTP server framework and sometimes just bare Tokio when the situation calls for it. I’ve had a vague idea of what it is that an executor does for a while, but I’ve never considered writing my own until now. This will be a series of posts detailing the development of my async executor, Till.

Goals

Goal 1: My goal isn’t to build a better Tokio, that seems like setting myself up for failure. Instead my goal is to do what Tokio can’t do, run on small embedded systems. I could target the ESP32 since I work with them a lot, but since you can build Rust’s std on top of esp-idf I wouldn’t be surprised if it could just run Tokio anyway. Instead I’ll do my testing on something much smaller, an Arduino Nano. There’s no std support for AVR targets since they’re much too small, and since Tokio normally runs on machines with at least tens of gigabytes of RAM, it’s probably more optimised for that kind of use so I doubt it could fit into the 2.5 kilobytes of RAM of the AtMega328.

Goal 2: Beyond targeting tiny embedded systems without std, I also want to support no-alloc environments too. With so little memory to work with on these systems, the predictability of static and stack only are especially valuable. I want to offer extra features if alloc is available, but it shouldn’t be mandatory.

Goal 3: Finally, although I’ll be testing on an Arduino Nano, the whole thing should be agnostic of what it’s running on. All of the target specific logic should be able to be injected into it as needed by the consuming application.

Design

The way I see it an executor needs a few main parts:

Something to hold spawned tasks without moving them around since they need to be pinned.
Some kind of event loop where the executor will do IO operations on behalf of the futures running in the executor
Something to facilitate communication between these two pieces when a task needs to be woken up.

The executor is then just going to be a thin wrapper over these parts. Each of them will get a trait and the executor will take in implementors of these traits in its new method. That way the user can customise aspects of the executor’s function by implementing them on their own types.

The event loop part doesn’t seem like it should be strictly necessary so I’ll leave it out of my proof-of-concept and work on its design later.

Task Manager

Initial requirements for this part are to have some way to iterate over the stored tasks so the executor can poll them, and also to have a way to put tasks to sleep. The executor can’t hold the wake status of each task since it doesn’t know how many there are and would need to allocate at runtime to do so. Goal 2 precludes this requirement so that responsibility has to be pushed onto the implementor of this trait.

Here’s what that trait might look like:

pub trait TaskManager {
    type TaskIterator<'a>: Iterator<
        Item = Pin<&'a mut dyn Future<Output = ()>>
    >
    where
        Self: 'a;
    fn get_task(
        &mut self,
        i: usize,
    ) -> Option<Pin<&mut dyn Future<Output = ()>>>;
    fn sleep_task(&mut self, i: usize);
    fn sleep_all(&mut self);
    fn tasks<'a>(&'a mut self) -> Self::TaskIterator<'a>;
}

I didn’t realise this until after I had written it, but here’s a great use for Rust’s recent MVP support for generic associated types (GATs).

The only thing the jumps out as needing improvement here is the i: usize in get_task and sleep_task. The idea I had was that it should correspond with when in the iterator sequence the task gets returned, but that probably imposes too many restrictions on the implementor. I’ll continue with this for now but it’ll need some more work later.

Context

Looking at the Future trait…

pub trait Future {
    type Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

…the next thing to figure out is Context<'a> since I’ll need those to be able to poll the tasks. Initially it looks quite good. There’s a lifetime parameter on it so maybe this communication part mentioned earlier will be able to exist on the stack. I’ve looked at this type before and remember that there’s a RawWaker which is sort of like a &dyn WakerTrait except WakerTrait doesn’t exist and instead you have to make the V-table manually. That RawWaker then gets put behind a wrapper, Waker for some reason that I don’t understand.

Looking at Waker I notice that unlike Context<'a>, it doesn’t have a lifetime parameter. I guess if you only ever have a 'a reference to it from a context it’s basically the same thing but then I notice it implements Clone. So much for having the communication part be able to live on the stack, with a Waker that can live forever so too must anything it references also live forever. This means I need 'static lifetimes for the Waker’s references which means static or heap and since I don’t want to require alloc, I’ll have to keep in mind as I design that the communication part will be a static variable. Let’s give this communication part the name…

Marshall

I think I’ve seen this term used in the context of interrupt handlers where you want to go from a function pointer to making a method call on a particular object. This situation of having to go from a 'static to a 'a feels similar to me so I’ll use the same name for it.

Here’s the trait:

pub trait Marshall: Sync + 'static {
    fn wake(&'static self);
    fn waker(&'static self) -> core::task::Waker;
}

The documentation for Waker makes note that it implements Send and Sync. Since I think my waker will basically be a reference to an implementor of Marshall, that implementor will need to be Sync to be safe in multi-threaded environments. Although maybe not strictly necessary, I’ve included a wake method in addition to the waker since it was easier for me to think about it this way.

This trait is kept deliberately very simple, nothing about how it communicates with a task manager is present. This is because I expect the TaskManager implementor and Marshall implementor to be written as a compatible pair and it makes more sense to have the bound be on the TaskManager side, which now becomes:

pub trait TaskManager {
    type Marshall: Marshall;
    type TaskIterator<'a>: Iterator<
        Item = (
            Pin<&'a mut dyn Future<Output = ()>>,
            &'static Self::Marshall,
        ),
    >
    where
        Self: 'a;
    fn get_task(
        &mut self,
        i: usize,
    ) -> Option<(
        Pin<&mut dyn Future<Output = ()>>,
        &'static Self::Marshall,
    )>;
    fn sleep_task(&mut self, i: usize);
    fn sleep_all(&mut self);
    fn tasks<'a>(&'a mut self) -> Self::TaskIterator<'a>;
}

The executor will need to be able to get a reference to the Marshall to be able to construct a Context<'a> to poll with, and depending on implementation each task may need to have it’s own Marshall instance so making it part of the task iterator and getter seems right.

Executor

Now seems like a good point to start writing the Executor wrapper type. Here’s the first version:

pub struct Executor<'a, Tasks: TaskManager> {
    tasks: &'a mut Tasks,
}

For now it’s just a wrapper over a mutable reference to a TaskManager but I expect there to be more in here once the event loop part is built.

Methods on the type look like this:

impl<'a, Tasks: TaskManager> Executor<'a, Tasks> {
    pub fn new(tasks: &'a mut Tasks) -> Self {
        Executor { tasks }
    }

    pub fn run_to_completion(self) {
        while self.tasks.tasks().any(|(task, _)| !task.is_terminated()) {
            for (i, (mut task, marshall)) in
                self.tasks.tasks().enumerate().filter(|(_, (task, _))| {
                    todo!("What goes here?")
                })
            {
                let waker = marshall.waker();
                let mut context: core::task::Context<'_> = core::task::Context::from_waker(&waker);
                self.tasks.sleep_task(i);
                let _ = task.poll(&mut context);
            }
        }
    }
}

Unfortunately, run_to_completion doesn’t compile. We’re borrowing from self.tasks for as long as we have a pinned reference to the task, so we can’t then borrow again to set the sleep status. Also, I want to be able to filter out tasks that are asleep or completed since they don’t need to be polled, but I don’t have the means to check for either. Luckily both of these issues can be solved with the same technique, use a sub-trait of Future for the tasks.

If we depend on the futures we can use its FusedFuture trait so we can check if a task is complete. Similarly we can define our own sub-trait of FusedFuture which stores a wake status too.

pub trait FusedFutureWithWakeStatus: FusedFuture {
    fn status(&self) -> WakeStatus;
    fn set_status(self: Pin<&mut Self>, status: WakeStatus);
}

WakeStatus is just a two-state enum:

pub enum WakeStatus {
    Woken,
    Asleep,
}

set_status needs to be callable when pinned so Pin appears in the self type. Since the status method is immutable it’s not needed.

TaskManager now becomes:

pub trait TaskManager {
    type Marshall: Marshall;
    type TaskIterator<'a>: Iterator<
        Item = (
            Pin<&'a mut dyn FusedFutureWithWakeStatus<Output = ()>>,
            &'static Self::Marshall,
        ),
    >
    where
        Self: 'a;
    fn get_task(
        &mut self,
        i: usize,
    ) -> Option<(
        Pin<&mut dyn FusedFutureWithWakeStatus<Output = ()>>,
        &'static Self::Marshall,
    )>;
    fn sleep_task(&mut self, i: usize);
    fn sleep_all(&mut self);
    fn tasks<'a>(&'a mut self) -> Self::TaskIterator<'a>;
}

And Executor::run_to_completion becomes:

pub fn run_to_completion(self) {
    while self.tasks.tasks().any(|(task, _)| !task.is_terminated()) {
        for (i, (mut task, marshall)) in
            self.tasks.tasks().enumerate().filter(|(_, (task, _))| {
                !task.is_terminated() && matches!(task.status(), WakeStatus::Woken)
            })
        {
            let waker = marshall.waker();
            let mut context: core::task::Context<'_> = core::task::Context::from_waker(&waker);
            task.as_mut().set_status(WakeStatus::Asleep);
            let _ = task.poll(&mut context);
        }
    }
}

Since we’re depending on FusedFuture we can just discard the result of task.poll as it’s implementation will update termination status for us if needed.

Conclusion

That’s it for the first part in this series. We’ve laid down the core ideas and formalised them into some traits. In the next part I’ll be implementing these traits to see how they work in practise.

Recommended Projects

Till: Part 2

Implementing MVP and testing on an Arduino