I have been working on making containerd work well on Windows with Kubernetes. I needed to do some local dev on containerd so I started to configure my local machine. I found lots of information here and there but nothing comprehensive so I wrote down my steps.
Note there are a few limitations with containerd (so you might not want to fully uninstall Docker). For instance, containerd doesn’t support building containers and you won’t be able to use the native Linux containers functionality (though there is some early support for LCOW).
Unlike Docker, Containerd doesn’t attach the pods to a network directly. It uses a CNI (container networking interface) plugin to set up the networking. We will use the windows nat plugin for our local dev env. Not this is likely the plugin you would want to use in a kubernetes cluster, for that you should look at Calico, Antrea, or many of the cloud specific ones that are avalaible. For our dev environment NAT will work just fine.
Create the folder for cni binaries and configuration. The path C:\Program Files\containerd\cni\bin is the default location for containerd.
There are two main ways to interact with containerd: ctr and crictl. ctr is great for simple testing and crictl is used to interact with containerd in the same way that kubernetes works.
CTR
Ctr was already installed when you pulled the binaries and installed containerd. It has a similiar interface to docker but it does vary some since you are working at a different level.
There is a bug with ctr that doesn’t support multi-arch images on Windows (PR to fix it) so we will use the specific image that matches our operating system. I am running Windows 10 20H2 (build number 19042) so will use that tag for the image.
# find your version
cmd /c ver
Microsoft Windows [Version 10.0.19042.867]
cd "C:\Program Files\containerd\"
./ctr.exe pull mcr.microsoft.com/windows/nanoserver:20H2
./ctr.exe run -rm mcr.microsoft.com/windows/nanoserver:20H2 test cmd /c echo hello
hello
Create a pod via crictl
Using CRI api can be alittle cumbersome. It requires doing each step independently.
I have been working on adding IPV6 support to the Azure cluster api provider (CAPZ) over the last few weeks. I ran into some trouble configuring Calico as the Container Networking Interface (CNI) for the solution. There are some very old (2017) and popular (updated 13 days ago at time of writing) issues on github where folks are looking for ways to fix the issue.
This is obviously something that is hard to debug due to the error message being vague and the wide variety of CNI’s available and the wide range of issues that could be causing it. The original issues did seem to be root caused but others (me included) are still running into the issue regularly. In my case, I was already running the latest version of my cni (plus just running the latest isn’t really understanding the root cause) and I understand why removing environment variables doesn’t really help. I figured I had simply configured something wrong but what was it?
Unfortunately this is not very descriptive. Sometimes kubelet has other error messages in it that give you pointers to what is wrong but in my case there was nothing in the logs. The CNI itself wasn’t working but there weren’t any error logs that I could find.
After a big of digging I found the a developer tool for executing CNI’s manually. This allows you to attach an interface to an already created network namespace via a CNI you provide. After building the tool, it is pretty straight forward to execute it with the cni and configuration that is installed on the kubernetes node.
It assumes the cni config is installed to /etc/cni/net.d and /opt/cni/bin where cni’s are typically installed if you use kubeadm:
# create a test network namespace
sudo ip netns add testingns
# run with default config location
sudo CNI_PATH=/opt/cni/bin cnitool add uniquename /var/run/netns/testingns
Discovering the bug
In my case it was a simple misconfiguration, can you spot it?
Important: be careful with the Calico spec below. It is likely not what you are expecting. At the time of writing Calico vxlan doesn’t support IPV6 which is one of the ways to run Calico on Azure. Another way is to use UDR’s and host-local ipam which requires --configure-cloud-routes in the cloud provider which is what is happening above. I also has the misconfiguration in it.
---# Source: calico/templates/calico-config.yaml# This ConfigMap is used to configure a self-hosted Calico installation.kind:ConfigMapapiVersion:v1metadata:name:calico-confignamespace:kube-systemdata:# Typha is disabled.typha_service_name:"none"# Configure the backend to use.calico_backend:"none"# The CNI network configuration to install on each node. The special# values in this config will be automatically populated.# https://docs.projectcalico.org/reference/cni-plugin/configuration#using-host-local-ipamcni_network_config:|-{"name": "k8s-pod-network","cniVersion": "0.3.1","plugins": [{"type": "calico","log_level": "info","datastore_type": "kubernetes","nodename": "__KUBERNETES_NODE_NAME__","mtu": 1500,"ipam": {"type": "host-local","subnet": "usePodCidr",},"policy": {"type": "k8s"},"kubernetes": {"kubeconfig": "__KUBECONFIG_FILEPATH__"}},{"type": "portmap","snat": true,"capabilities": {"portMappings": true}}]}
Yes, that is comma after "usePodCidr" which gave the cryptic error message network plugin is not ready: cni config uninitialized. Fun times with json embedded in yaml :-) After removing it everything behaved as expected.
Next steps
I am going to take a look at the K8s/cni code to figure out if it’s possible to bubble up better error message as this is obviously a pain point for folks.
When deploying an Azure Kubernetes Service cluster you are required to use a service principal. This service principal is used by the Kubernetes Azure Cloud Provider to do many different of activities in Azure such as provision IP addresses, create storage disks and more. If do not specify a Service principal at the time of creation for the an AKS cluster a service principal is created behind the scenes.
In many scenarios, the resources your cluster will need to interact with will be outside the auto-generated node resource group that AKS creates. For instance, if you are using Static IP address for some of you services, the IP addresses might live in a resource group outside the auto-generated node resource group. Another common example is attaching to an existing vnet that is already provisioned.
Assigning proper permissions
It would be simple to give Contributor rights across your whole sub or even individual resource groups and these scenarios would work but this is not a good practice. Instead we should assign the specific, least privileged rights to a given service principal.
In the examples of attaching IP addresses, we may need the Azure Cloud Provider to be able to attach but not delete IP addresses because a different team controls the IP address. We definitely don’t want the Service principal to be able to delete any other resources in the resource group.
There is a good list of the Kubernetes v1.11 permissions required here. This list shows the permissions for creating a least privileged service principal (note that it might change as k8s version change so use as general guide). Using this we can assign just enough rights to the service principal to interact with the resources outside the node group.
Sample Walk through
I have created a full walk through sample of creating the service principal upfront and assign just enough rights to it to access the IP addresses and vnet. Here is a highlight of the important parts:
First create a service principal with no permission:
clientsecret=$(az ad sp create-for-rbac --skip-assignment-n$spname-o json | jq -r .password)
Using custom Role Definitions I am able to give the service principal only read access to the IP’s:
{"Name":"AKS IP Role","IsCustom":true,"Description":"Needed for attach of ip address to load balancer created in K8s cluster.","Actions":["Microsoft.Network/publicIPAddresses/join/action","Microsoft.Network/publicIPAddresses/read"],"NotActions":[],"DataActions":[],"NotDataActions":[],"AssignableScopes":["/subscriptions/yoursub"]}
Using that definition we can then apply the Role Definition to the Service principal which will give it IP read access to the resource group:
spid=$(az ad sp show --id"http://$spname"-o json | jq -r .objectId)iprole=$(az role definition create --role-definition ./role-definitions/aks-reader.json)
az role assignment create --role"AKS IP Role"--resource-group ip-resouce-group --assignee-object-id$spid
Then you can deploy your cluster using the service principal:
az aks create \
--resource-group aks-rg \
--name aks-cluster \
--service-principal $spid \
--client-secret $clientsecret
This will allow the Service principal used to access the the IP Addresses in the resource group outside the node.
note: to access the ip address in group outside the cluster you will need to provide an annotation on the k8s Service definition (service.beta.kubernetes.io/azure-load-balancer-resource-group). See the example.
Yesterday I had to revert to the command line to debug a program I was working on (long story as to why). I was initially apprehensive to debug from the command line but once I started I found the experience was quite nice. In fact I would compare debugging from the command line like “Vim for debugging”. Best part was I found the bug I was looking for!
I was working in Golang so I used the awesome tool Delve which is used by the VS Code Go extension. I didn’t find a simple tutorial on using the command line so I’ve created this to help me remember next time I need to fallback to the command line.
Install Delve (dlv)
The following steps assumes you have set up your go path.
dlv version
Delve Debugger
Version: 1.1.0
Build: $Id: 1990ba12450cab9425a2ae62e6ab988725023d5c $
Start Delve on a project
Delve can be run in many different ways but the easiest is to use dlv debug. This takes your current go package, builds it, then runs it. You can specify command parameters by using -- <application parameters> after the delve paramters you pass.
Get the example Go app and then start delve using the hello package:
$ go get github.com/golang/example/...
cd$GOPATH/src/github.com/golang/example/
# run delve$ dlv debug ./hello
Type 'help'for list of commands.
(dlv)
You have now built the app and entered the debugging experience. Easy enough if you are like me, you are wondering what’s next? With so many command how do I do start to do anything useful?
Navigating around at the command line
There are tons of commands you can run at this point but I like I like to start with just getting into application to see where I am and get started on the right foot. So let’s set a breakpoint (b is short for break):
(dlv) b main.main
Breakpoint 1 set at 0x49c3a8 for main.main() ./hello/hello.go:25
(dlv)
(dlv)c>main.main()./hello/hello.go:25(hitsgoroutine(1):1total:1)(PC:0x49c3a8)20:"fmt"21:22:"github.com/golang/example/stringutil"23:)24:=>25:funcmain(){26:fmt.Println(stringutil.Reverse("!selpmaxe oG ,olleH"))27:}(dlv)
Whoa! Now we hit the break point and can even see the code. Cool! This is fun lets keep going by stepping over the code (n is short for next):
(dlv)n>main.main()./hello/hello.go:26(PC:0x49c3bf)21:22:"github.com/golang/example/stringutil"23:)24:25:funcmain(){=>26:fmt.Println(stringutil.Reverse("!selpmaxe oG ,olleH"))27:}
Nice, now we can see that we have stepped to the next line of code. Now there isn’t much to this code but we can see that we do have a package called stringutil that we are using to reverse the string. Let’s step into that function (s is short for step):
(dlv)s>github.com/golang/example/stringutil.Reverse()./stringutil/reverse.go:21(PC:0x49c19b)16:17:// Package stringutil contains utility functions for working with strings.18:packagestringutil19:20:// Reverse returns its argument string reversed rune-wise left to right.=>21:funcReverse(sstring)string{22:r:=[]rune(s)23:fori,j:=0,len(r)-1;i<len(r)/2;i,j=i+1,j-1{24:r[i],r[j]=r[j],r[i]25:}26:returnstring(r)(dlv)
Now we inside another package and function. Let’s inspect the variable that has been passed in (p is short for print):
(dlv) p s
"!selpmaxe oG ,olleH"(dlv)
Yup, that’s the string value we were expecting! Next let’s setting another breakpoint at line 24 and if we hit continue we should end up there:
(dlv)b24Breakpoint2setat0x49c24eforgithub.com/golang/example/stringutil.Reverse()./stringutil/reverse.go:24(dlv)c>github.com/golang/example/stringutil.Reverse()./stringutil/reverse.go:24(hitsgoroutine(1):1total:1)(PC:0x49c24e)19:20:// Reverse returns its argument string reversed rune-wise left to right.21:funcReverse(sstring)string{22:r:=[]rune(s)23:fori,j:=0,len(r)-1;i<len(r)/2;i,j=i+1,j-1{=>24:r[i],r[j]=r[j],r[i]25:}26:returnstring(r)27:}(dlv)
Sweet, but what’s the values for i, j, and even r? Let’s inspect the locals:
(dlv) bp
Breakpoint unrecovered-panic at 0x42a7f0 for runtime.startpanic() /usr/local/go/src/runtime/panic.go:591 (0)
print runtime.curg._panic.arg
Breakpoint 1 at 0x49c3a8 for main.main() ./hello/hello.go:25 (0)
Breakpoint 2 at 0x49c24e for github.com/golang/example/stringutil.Reverse() ./stringutil/reverse.go:24 (0)(dlv) clearall main.main
Breakpoint 1 cleared at 0x49c3a8 for main.main() ./hello/hello.go:25
(dlv)
Alright if you’re like me your feeling pretty good but are you ready to get fancy? We know the name of the function we want to jump to so let’s search for the function name, set a breakpoint right were we want it, then execute to it:
(dlv) funcs Reverse
github.com/golang/example/stringutil.Reverse
(dlv) b stringutil.Reverse
Breakpoint 3 set at 0x49c19b for github.com/golang/example/stringutil.Reverse() ./stringutil/reverse.go:21
(dlv) c
> github.com/golang/example/stringutil.Reverse() ./stringutil/reverse.go:21 (hits goroutine(1):1 total:1)(PC: 0x49c19b)
16:
17: // Package stringutil contains utility functions for working with strings.
18: package stringutil
19:
20: // Reverse returns its argument string reversed rune-wise left to right.
=> 21: func Reverse(s string) string {
22: r :=[]rune(s)
23: for i, j := 0, len(r)-1; i < len(r)/2; i, j = i+1, j-1 {
24: r[i], r[j] = r[j], r[i]
25: }
26: return string(r)(dlv)
(dlv)c>github.com/golang/example/stringutil.Reverse()./stringutil/reverse.go:24(hitsgoroutine(1):1total:1)(PC:0x49c24e)19:20:// Reverse returns its argument string reversed rune-wise left to right.21:funcReverse(sstring)string{22:r:=[]rune(s)23:fori,j:=0,len(r)-1;i<len(r)/2;i,j=i+1,j-1{=>24:r[i],r[j]=r[j],r[i]25:}26:returnstring(r)27:}(dlv)plen(r)-118(dlv)
Everything looks fine, so I guess I work is done here:
That just touched the surface of what you can do but I hope it made you more comfortable with debugging on the command line. I think you can see how fast and efficient you can be. Good luck on you bug hunting!
I currently work for an awesome team at Microsoft called CSE (Commercial Software Engineering), where we work with customers to help them solve their challenging problems. One of the goals of my specific team inside CSE is to identify repeatable patterns our customers face. It is a challenging, but rewarding role where I get to work on some of the most interesting and cutting edge technology. Check out this awesome video that talks about how my team operates.
While working with customers at a recent engagement, I recognized a repeating pattern with in the monitoring solutions we were implementing on Azure Kubernetes Service (AKS) with customers. We had 5 customers in the same room and 3 of them wanted to scale on custom metrics being generated by their applications. And so the Azure Kubernetes Metric Adapter was created.
Why do we need the Azure Kubernetes Metric Adapter
Two of the other customers were not using Prometheus, instead they using Azure services such as Azure Monitor, Log Analytics and Application Insights. At the engagement, one of the customers started to implement their own custom scaling solution. This seemed a bit repetitive as the other customer where not going to be reuse there solution. And so Azure Kubernetes Metric Adapter was created.
That is a bit of a mouth full so let’s take a look at wha the solution looks like when deployed onto your cluster:
The Azure Metric Adapter is deployed onto you cluster and wired up the Horizontal Pod Autoscaler (HPA). The HPA checks in periodically with the Adapter to get the custom metric defined by you. The adapter in turn calls to an Azure endpoint to retrieve the metric and give it back to the HPA. The HPA then evaluates the value and compares it to the target value you have configured for a given deployment. Based on an algorithm the HPA will either leave you deployment alone, scale up the pods or scale them down.
As you can see you there is no custom code needed to scale with custom or external metrics when using the Adapter. You deploy the Adapter, configure an HPA and the rest of the scaling is taken care of by Kubernetes.
There are two main scenarios that have been addressed first and you can see a step by step walk through for each, though you can scale on any Application Insights metric or Azure Monitor Metric.
And your that’s it to enable auto scaling on External Metric. Checkout the samples for more examples.
Wrapping up
Hope you enjoy the Metric Adapter and can use it to scale your deployments automatically so you can have more time to sip coffee, tea, or just read books. Please be sure to report any bugs, features, challenges you might have with it.