Running the Azure Functions Runtime in Kubernetes

This is a two part series. The second post is on Deploying an Azure Function App to Kubernetes using a Helm Chart.

The Azure Functions team recently released the preview of Azure Functions on Linux. A colleague Vy Ta and I thought it would be fun to see if we could get Azure Functions running in Kubernetes. Here are the steps to get it work. To follow along you will need to have:

Create a your Function App and Docker image

The first step is to use the Functions Core Tools to create a sample function:

func init . --docker --sample

Next build your Docker image and push to a docker repository:

docker build -t az-functions.
docker tag az-functions <your-repo>/az-functions

docker login
docker push <your-repo>/az-functions

Set up and run on Kubernetes

Next we will create a deployment and service on Kubernetes and make sure we can access it. The commands below assume you have a Kubernetes cluster running in a Cloud.

Create a deployment on Kubernetes:

kubectl run azfunctions --image=<your-repo>/az-functions --port=80 --requests=cpu=200m

Create a service and wait for an IP address:

kubectl expose deployment azfunctions --port=80 --type=LoadBalancer
kubectl get svc -w

Once you have a ip address you should be able to open a browser and view the end point at http://<your-ip-address>/api/httpfunction?name=james (assuming you used the sample function).

Autoscale your Function App

Once we have both of those running we can set up a Pod Auto Scaler and test scaling our function app.

Set up auto Scaling by:

kubectl autoscale deployment azfunctions --cpu-percent=50 --min=1 --max=10

Note: For the auto-scaler to work you need to create you deployment with the --requests=cpu=200m property as we did above in kubectl run command. It is possible to autoscale on other metrics as well.

Test Autoscaling

We have enabled auto scaling so let give it a spin. We will test it the same way as in Kubernetes Pod Autoscale walk through.

Open a new terminal and run:

kubectl run -i --tty load-generator --image=busybox /bin/sh
/ \#: 		while true; do wget -q -O- http://<your-ipaddress>/api/httpfunction?name=testingload; done 

This will ping your endpoint. After a few moments you should be able to see the load increasing:

kubectl get hpa

#output
NAME          REFERENCE                TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
azfunctions   Deployment/azfunctions   167% / 50%   1         10        4          5m

You can also see the number of pods in the deployment increase:

kubectl get deploy azfunctions

#output
NAME          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
azfunctions   4         4         4            4           10m

If you kill the busybox command we used to generate the load you will see the pods scale back down.

Where to next

This is a great way to see how you could use the Azure Functions runtime on premise and opens a lot of possibilities. One scenario might be that you have two teams. One that is working On Premises and another that works in Azure but you want to share the same programming model of Azure functions across the two teams. Another scenario is to use Kubernetes and then use Azure Container Instances for extra scaling when needed. What do you think you might use this for? Leave a note in the comments.

Some other interesting scenarios you can looking into are:

Creating a Finite State Machine Distributed Workflow with Service Fabric Reliable Actors

The accompanying source code can be found at github.com/jsturtevant/azure-service-fabric-actor-fsm.

I recently had to manage a long running workflow inside Azure Service Fabric solution. I remembered reading about using a Finite State Machine (FSM) with Actors in the book Programming Microsoft Azure Service Fabric by Haishi Bai. When I went looking for examples of how I might use a FSM with the Service Fabric Actor Programming model I wasn’t able to find one, so I built an example of using a FSM with Service Fabric Actors.

Although, Finite State Machines might sound scary they are not to hard to implement or us, once you learn some basic terminology. FSM’s are great for many types of scenarios but keeping tack of long running workflows with a predetermined sequence of actions is a great use case.

In Haishi’s book, he walks through the example of how it might be used in an ordering and shipping process to track items as they move through the ordering, manufacturing, shipping and invoicing product stages. In my case I needed to manage creating and configuring multiple resources within my Service Fabric cluster (applications and services) and Azure (IP addresses and Load balancers, etc). Each individual step required the previous step to be complete and could take anywhere from a few seconds to 10 minutes.

In such a distributed workflow, where there are many different services implementing each of the individual steps you it can be challenge to track and manage all of the different steps. A single Actor for each instance of the workflow simplifies the process and simplifies the scaling of the workflow service. For each instance of a new workflow a new actor is created and the state for the workflow is managed by that actor.

An example of a Actor managing multiple states and services might look like:

Finite State Machine Workflow using Reliable Actors and multiple distributed services.

A few of the benifits of using a Reliable Actor with a FSM in this case are:

  • FSM will centralize the logic for the workflow into one location
  • the workflow service will scale well as each actor has independent state and logic for a given workflow instance
  • can use timed Reminders on the actor to trigger workflow timeouts and notify other services something is wrong.
  • monitoring status of the workflow is simplified
  • canceling a workflow is simplified (if in correct state)

A notable trade off of using Reliable Actors are they are inheritly single threaded. If your workflow requires a high number of reads and writes then you might find that the Actors will not be a good choice do to throughput performance. In both of the use cases noted above this is not an issue so the models maps nicely to the Reliable Actors.

Create the Finite State Machine (FSM) with Service Fabric Actor

Instead of implementing a FSM myself I used the very popular Stateless FSM library. I chose this library because of its popularity and it comes with the ability to externalize the state of the FSM. In this case I wanted to persist the state using a Service Fabric Stateful Reliable Actor.

You can install the library via Nuget Install-Package Stateless and checkout some of the examples to get an understanding of to use the library.

I adapted the BugTracker example to create the Service Fabric code samples below.

For the remainder of the article we will walk through some of the important parts of the solution.

Working with external State inside the Actor

Stateless uses a simple enum to track the state the FSM is in at any given time. We can use Reliable Actors StateManager to save this value durably when ever the State Machine changes state.

Using external state in Stateless is fairly easy:

//Private member Variable
enum State { Open, Assigned, Deferred, Resolved, Closed }

//initializing Stateless StateMachine with external state.
machine = new StateMachine<State, Trigger>(() => this.state, s => this.state = s);

Saving the State durably (replicated across multiple nodes for failover) using Reliable Actors is also straight forward. Within any method of the Actor we can pass a key and value:

await this.StateManager.SetStateAsync("state", machine.State);

Now let’s see how we can use this to create a FSM using Service Fabric Reliable Actors.

Initializing the State of the Actor

Reliable Actors are virtual, which means they always “exist” and the lifetime of an actor is not tied to their in-memory representation. An individual actor can also be “Garbage Collected” (deactivated) from the system at if not used for a period of time, although the state will be persisted. This means two things when an working with the FSM in the actor framework:

  • The first time an actor is activated we need to set the initial FSM state and configure the FSM
  • When a new request comes in for a Actor that has been deactivated we will need to load the state and re-configure the FSM

The code to handle both first time activation and re-activation happens in the OnActivateAsync method and will look something like:

protected override async Task OnActivateAsync()
{
    // Handle Activation for first time and re-activation after being garbage collected
    ActorEventSource.Current.ActorMessage(this, $"Actor activated: {this.GetActorId()}");

    var savedState = await this.StateManager.TryGetStateAsync<State>("state");
    if (savedState.HasValue)
    {
        // has started processing
        this.state = savedState.Value;
    }
    else
    {
        // first load ever - initialize
        this.state = State.Open;
        await this.StateManager.SetStateAsync<State>("state", this.state);
    }

    ActorEventSource.Current.ActorMessage(this, $"Actor state at activation: {this.GetActorId()}, {this.state}");

    //Configure the Stateless StateMachine with the state.
    machine = new StateMachine<State, Trigger>(() => this.state, s => this.state = s);
    assignTrigger = machine.SetTriggerParameters<string>(Trigger.Assign);
    machine.Configure(State.Open)
        .Permit(Trigger.Assign, State.Assigned);
    machine.Configure(State.Assigned)
        .SubstateOf(State.Open)
        .OnEntryFromAsync(assignTrigger, assignee => OnAssigned(assignee))
        .PermitReentry(Trigger.Assign)
        .Permit(Trigger.Close, State.Closed)
        .Permit(Trigger.Defer, State.Deferred)
        .OnExitAsync(() => OnDeassigned());
    machine.Configure(State.Deferred)
        .OnEntryAsync(() => this.StateManager.SetStateAsync<string>("assignee", null))
        .Permit(Trigger.Assign, State.Assigned);
}

You will notice that the state machine gets saved in a local member variable. From here on out we will interact with our state machine in memory and persist the state object using the StateManager when ever the FSM Transitions from one state to another.

Triggering a new state from external system

In the case of managing a distributed workflow, the external services that are doing the work of at each step of the workflow will need a way to notify the workflow actor that there work has either completed or failed. We can create methods on the actor that will enable the clients to interact with the actor and workflow. If any service requests a transition of state outside of the workflow, an error will be returned to the client notifying them that the action is not available at this time.

The first step is to modify the interface that Actor implements:

public interface IFSM : IActor
{
    Task Close(CancellationToken cancellationToken);
    Task Assign(string assignee, CancellationToken cancellationToken);
    Task<bool> CanAssign(CancellationToken cancellationToken);
    Task Defer(CancellationToken cancellationToken);
}

[StatePersistence(StatePersistence.Persisted)]
internal class FSM : Actor, IFSM
{
  /// implement members and methods like OnActivateAsync here
}

The next step is to implement the methods and save the FSM state as the workflow transitions. Since the FSM is in memory when an actor is active we can work with the FSM’s actual state:

public async Task Assign(string assignee, CancellationToken cancellationToken)
{
    //trigger transition to next workflow state.
    await machine.FireAsync(assignTrigger, assignee);

    //check that local state is same as machine state then save.
    Debug.Assert(state == machine.State);
    await this.StateManager.SetStateAsync("state", machine.State);
}

This creates a transition in the FSM (assignTrigger) and fires an event which could then call out to another service to kick of the next step of the workflow. Now that the next step in the workflow has been activated we can save the state of the machine. Calling the Assign method from an external service is simple:

ActorId actorId = new ActorId(Guid.NewGuid().ToString());
IFSM actor = ActorProxy.Create<IFSM>(actorId, new Uri("fabric:/ActorFSM/FSMActorService"));
var token = new CancellationToken();
await actor.Assign("Joe", token);

In the assignTrigger of the FSM we can also save the the actual assignee durably using the Actors StateManager. This means that we can not only query the actor to get the current step of the workflow but also who is assigned. The event wired to FireAsync looks like:

async Task OnAssigned(string assignee)
{
    var _assignee = await this.StateManager.TryGetStateAsync<string>("assignee");
    if (_assignee.HasValue && assignee != _assignee.Value)
      await SendEmailToAssignee("Don't forget to help the new employee!");

    await this.StateManager.SetStateAsync("assignee", assignee);
    await SendEmailToAssignee("You own it.");
}

Getting current State of a workflow

A system out side our workflow might want to know what state a given workflow instance is in. This is again is made simple through the Actor. What is interesting to note is that since the actor is single threaded, you will always get the latest state of the Workflow.

Implement the current status as a method on the Actor:

public async Task<BugStatus> GetStatus()
{
  var statusHistory = await this.StateManager.TryGetStateAsync<StatusHistory>("statusHistory");
  var assignee = await this.StateManager.TryGetStateAsync<string>("assignee");

  var status = new BugStatus(this.machine.State);
  status.History = statusHistory.HasValue ? statusHistory.Value : new StatusHistory(machine.State);
  status.Assignee = assignee.HasValue ? assignee.Value : string.Empty;

  return status;
}

Getting the current status:

var actor1Status = await actor.GetStatus();
Console.WriteLine($"Actor 1 with {actorId} has state {actor1Status.CurrentStatus}");

Knowing the history of all the states a workflow has gone through will almost certainly be a requirement. Updating the workflow history as it move throught the FSM is straight forward using the machines machine.OnTransitionedAsync(LogTransitionAsync) event and wiring up a function to log it:

private async Task LogTransitionAsync(StateMachine<State, Trigger>.Transition arg)
{
    var conditionalValue = await this.StateManager.TryGetStateAsync<StatusHistory>("statusHistory");

    StatusHistory history;
    if (conditionalValue.HasValue)
    {
        history = StatusHistory.AddNewStatus(arg.Destination, conditionalValue.Value);
    }
    else
    {
        history = new StatusHistory(arg.Destination);
    }

    await this.StateManager.SetStateAsync<StatusHistory>("statusHistory", history);
}

You can see that the GetStatus method previously implemented already returns that value.

Testing state persistance

As was discussed earlier, the Reliable Actor can be Garbage collected from the system if not used with in a given time period. I wanted to test this to make sure the Activation code worked properly with the FSM. The actor framework gives you some configuration settings that you can modify to tweak your actor’s behavior. I used these settings to aggressively garbage collect the Actors to test the state.

Note: Only modify these Actor settings is you have data to backup making a change to the default settings. In other words you should have sufficient monitoring set up that points to your actors being garbage collected to early or to late for your scenario.

During registration of the Actor with Service Fabric in the program.cs you can adjust the ActorGarbageCollectionSettings. The first number is scan interval (how often it checks the system) and the second number is the idle time out (how long an actor must be idle for):

internal static class Program
{
  private static void Main()
  {
      try
      {
          ActorRuntime.RegisterActorAsync<FSM>(
              (context, actorType) => new ActorService(context, actorType, settings: new ActorServiceSettings()
              {
                  ActorGarbageCollectionSettings = new ActorGarbageCollectionSettings(10, 5)
              })).GetAwaiter().GetResult();

          Thread.Sleep(Timeout.Infinite);
      }
      catch (Exception e)
      {
          ActorEventSource.Current.ActorHostInitializationFailed(e.ToString());
          throw;
      }
  }
}

Then during my test I waited period of time before working with the actors and used the Service Fabric management API’s to make sure the Actors were deactivated:

ActorId actorId2 = new ActorId(Guid.NewGuid().ToString());
IFSM actor2 = ActorProxy.Create<IFSM>(actorId2, new Uri("fabric:/ActorFSM/FSMActorService"));
await actor2.Assign("Sue", token);

// Wait for actors to get garbage collected
await Task.Delay(TimeSpan.FromSeconds(20));

//Query management API's to verify
var actorService = ActorServiceProxy.Create(new Uri("fabric:/ActorFSM/FSMActorService"), actorId);
ContinuationToken continuationToken = null;
IEnumerable<ActorInformation> inactiveActors;
do
{
    var queryResult = await actorService.GetActorsAsync(continuationToken, token);
    inactiveActors = queryResult.Items.Where(x => !x.IsActive);
    continuationToken = queryResult.ContinuationToken;
} while (continuationToken != null);

foreach (var actorInformation in inactiveActors)
{
    Console.WriteLine($"\t {actorInformation.ActorId}");
}

Conclusion

Creating a distributed workflow with multiple steps and variable times to complete can be a difficult problem. Using Service Fabric’s Reliable Actors with a FSM machine library like Stateless not only simplifies the code that needs to be written but also ensures the scalablity of the solution because of the partitioned nature of Reliable Actors. If you don’t need lots of concurrent reads and writes then the Reliable Actors could be a great solution.

You can find the complete sample code for this solution at https://github.com/jsturtevant/azure-service-fabric-actor-fsm.

Training links for Microsoft and Azure

I often get asked how to get trained up on a given Microsoft technology. Like anything, there is a lot of info out there, so here is a consolidated list where I would get started. You can go to each resource and search for the technology you are interested in. I always start at the free tier then work up from there if I feel I need/want more.

Microsoft Virtual Academy (free)

Good for full overview of tech with both hands on and presentation style.

https://mva.microsoft.com/

Channel9 (free)

Good for learning more about specific part of tech. Often more specific subtopics of a given tech.

https://channel9.msdn.com/

3rd party video content (pay)

More high quality video training to supplement Channel9 and MVA

LinkedIn Learning: https://learning.linkedin.com Lynda.com: https://www.lynda.com Pluralsight: https://www.pluralsight.com/

Professional programs (pay)

Very structured Profession role based training that gives you a credential at the end.

Microsoft Academy: https://academy.microsoft.com/en-us/professional-program/

Certification Programs (pay for cert)

If your looking for Certifications the Microsoft Certification program is a good place. You can get good idea of what is covered on the exam (and there for things you should generally know/learn) from the exam prep pages.

https://www.microsoft.com/en-us/learning/certification-overview.aspx

Online/in-person Training with Microsoft Partners (pay)

If you are looking for in-person or online training with instructors you can check out the following resources:

https://www.microsoft.com/en-us/learning/training.aspx https://www.microsoft.com/en-us/learning/partners.aspx https://www.microsoft.com/en-us/learning/on-demand-course-partners.aspx

Setting Global Shortcuts for Bash for Windows

I have started using Bash for Windows as part of my workflow. For a long time, I have been using the standard command prompt with Git Bash tools hooked up to my path (using chocolately to install the tools in one command choco install git -params '"/GitAndUnixToolsOnPath"'). This was nice because I could use the shortcut Windows Key + X then C to open a command prompt and get access to Linux like tools such as ls, git and grep.

Now that I am using Bash for Windows I wanted to have a shortcut set up as well. I found it was really easy to set up using the Shortcut key setting on the shortcut properties window. This process could be used for any application that has a shortcut setup. The basic steps are:

  1. Find the shortcut by Windows Key the type bash
  2. Right click on the shortcut and select open file location

find shortcut for Bash for Windows

  1. Right click on the shortcut in the folder and select Properties
  2. Select Shortcut key and type your keyboard shortcut

setup global shortcuts for Bash for Windows

Deploying a Service Fabric Guest Executable with Configuration File

In Service Fabric, when your guest executable relies on a configuration file, the file must be put in the same folder as the executable. An example of an application that that needs configuration is nginx or using Traefik with a static configuration file.

The folder structure would look like:

|-- ApplicationPackageRoot
    |-- GuestService
        |-- Code
            |-- guest.exe
            |-- configuration.yml
        |-- Config
            |-- Settings.xml
        |-- Data
        |-- ServiceManifest.xml
    |-- ApplicationManifest.xml

You may need to pass the location of the configuration file via Arguments tag in the ServiceManifest.xml. Additionally you need to set the WorkingFolder tag to CodeBase so it is able to see the folder when it starts. The valid options for the WorkingFolder parameter are specified in the docs. An example configuration of the CodePackage section of the ServiceManifest.xml file is:

<CodePackage Name="Code" Version="1.0.0">
  <EntryPoint>
    <ExeHost>
      <Program>traefik_windows-amd64.exe</Program>
      <Arguments>-c traefik.toml</Arguments>
      <WorkingFolder>CodeBase</WorkingFolder>
      <ConsoleRedirection FileRetentionCount="5" FileMaxSizeInKb="2048"/> 
    </ExeHost>
  </EntryPoint>
</CodePackage>

Not the ConsoleRedirection tag also enables redirection of stdout and stderr which is helpful for debugging.