Using NixOS on an OpenStack Public Cloud

Posted on September 9, 2023 by Richard Goulter
Tags: programming.nixos, programming.openstack, programming.terraform

Here are some notes on using NixOS in an OpenStack public cloud.

Recall, NixOS is an operating system which makes use of the Nix package manager to manage its system configuration.
That NixOS allows declarative configuration of a system lends itself to building cloud VM images.

OpenStack is a standard cloud computing platform. It offers services broadly similar to AWS’ EC2, S3, etc..

One option for deploying NixOS configurations to a cloud VM is to run a NixOS VM, and then switch that VM to the configuration you want. – If the cloud provider doesn’t have a NixOS VM image to run, you’ll have to build your own image.

General Approach

The most ‘challenging’ part of this is wanting to run a NixOS VM, but the cloud provider not having a public NixOS image.

It’s possible to build a your own image in a format the cloud provider wants.

The nix-community’s nixos-generators is a good place to start for this.
Many popular image formats are supported.

In the case of OpenStack images, there’s a generator specifically for that.
I guess for other cloud providers, some customisation may be required; I’d dig through the code within nixos’ maintainers/scripts/ and modules/virtualisation/ to get an idea of what was done for the ones which are supported.

Using Terraform to Launch a VM on an OpenStack Public Cloud

I was doing this from a rather weak Macbook Air (Intel). I figured it’d be easier to build Linux images using Linux; so the first thing to do is launch a linux VM in the cloud.

My experience with using OpenStack on public clouds is that the networking may be handled slightly differently from one public cloud and another.
(The related “used to hope this would work” is the idea of “Terraform works with different clouds” translating to “plenty of Terraform code can be reused in order to easily support a multi-cloud deployment; e.g. have the same service in both AWS and GCP”. Maybe for OpenStack public clouds this could be largely true; but, it’s unlikely Terraform code for networking resources can be reused).

With AWS, I would think that an example of a simple Terraform task is “launch a VM with a publicly accessible IP”.
From what I’ve tried, it’s slightly trickier with OpenStack public clouds.

I had some spare credits with Cleura to use, and they offer a public cloud with an OpenStack API.

Managing OpenStack resources from outside the cloud console requires an OpenStack user. I found it convenient to download the RC file for the user, and use direnv to load those credentials.

Here’s the code listing for a main.tf to achieve this (with some notes below):

terraform {
  required_providers {
    openstack = {
      source  = "terraform-provider-openstack/openstack"
      version = "~> 1.49.0"
    }
  }
}

provider "openstack" {}

# Variables

variable "default_user_name" {
  description = "the name of the default user"
  type        = string
  default     = "debian"
}

variable "ssh_public_key" {
  description = "the SSH public key used to access the VM"
  type        = string
  default     = "ssh-ed25519 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
}

variable "allow_ssh_access_cidr" {
  description = "the CIDR to allow SSH access to. Defaults to 0.0.0.0/0 (unrestricted)"
  type        = string
  default     = "0.0.0.0/0"
}

variable "flavor" {
  description = "the compute flavor to use"
  type        = string
  default     = "1C-2GB-20GB"
}

variable "image_name" {
  description = "the name of the image the VM uses"
  type        = string
  default     = "Debian 11 Bullseye x86_64"
}

variable "instance_name" {
  description = "the name of the VM"
  type        = string
  default     = "debian"
}

# OpenStack Server flavor & image

data "openstack_compute_flavor_v2" "self" {
  name = var.flavor
}

data "openstack_images_image_v2" "self" {
  name        = var.image_name
  most_recent = true
}

# Networking

data "openstack_networking_network_v2" "ext" {
  name = "ext-net"
}

resource "openstack_networking_network_v2" "self" {
  name           = "terraform_vm_network"
  admin_state_up = "true"
}

resource "openstack_networking_subnet_v2" "self" {
  name       = "terraform_vm_subnet"
  network_id = openstack_networking_network_v2.self.id
  cidr       = "192.168.199.0/24"
  ip_version = 4
}

resource "openstack_networking_router_v2" "self" {
  name                = "terraform_vm_router"
  external_network_id = data.openstack_networking_network_v2.ext.id
}

resource "openstack_networking_router_interface_v2" "self" {
  router_id = openstack_networking_router_v2.self.id
  subnet_id = openstack_networking_subnet_v2.self.id
}

resource "openstack_networking_floatingip_v2" "self" {
  pool = data.openstack_networking_network_v2.ext.name
}

# Security

resource "openstack_compute_secgroup_v2" "allow_ssh" {
  name        = "allow_ssh"
  description = "allow SSH from the given CIDR"

  rule {
    from_port   = 22
    to_port     = 22
    ip_protocol = "tcp"
    cidr        = var.allow_ssh_access_cidr
  }
}

resource "openstack_compute_keypair_v2" "self" {
  name       = "terraform_keypair"
  public_key = var.ssh_public_key
}

# Instance

resource "openstack_compute_instance_v2" "self" {
  name            = var.instance_name
  flavor_id       = data.openstack_compute_flavor_v2.self.id
  key_pair        = openstack_compute_keypair_v2.self.name
  security_groups = [openstack_compute_secgroup_v2.allow_ssh.name]
  user_data       = <<-USER
  #cloud-config
  system_info:
   default_user:
    name: ${var.default_user_name}
  chpasswd: { expire: false }
  ssh_pwauth: false
  package_upgrade: true
  manage_etc_hosts: localhost
  runcmd:
    - "curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install --no-confirm"
  USER

  block_device {
    uuid                  = data.openstack_images_image_v2.self.id
    source_type           = "image"
    volume_size           = 20 # GBs
    boot_index            = 0
    destination_type      = "volume"
    delete_on_termination = true
  }
}

resource "openstack_compute_floatingip_associate_v2" "self" {
  floating_ip = openstack_networking_floatingip_v2.self.address
  instance_id = openstack_compute_instance_v2.self.id
}

# Outputs

output "public_ipv4" {
  value = openstack_networking_floatingip_v2.self.address
}

Again, I found the networking details to be a quite complicated.
The details were found by observing what resources were created when creating a VM with a public IP in the console.
Cleara uses the public network ext-net.
I’m not sure on the exact details, but I create a private network with a subnet, and to get a public IP (i.e. a floating ip in ext-net) to route to the VM, I create a router for the ext-net and a router interface which routes that to the private subnet. That public IP then gets associated with the VM.

Another part that can be annoying with OpenStack is specifying the VM resources (CPU/memory/storage).
It’s more flexible than AWS’ “c5.small/medium/large”.
One thing I found annoying it it’s not quite so freeform as "xC-yGB-zGB" for arbitrary x,y,z; I had to list the flavors to find one.
(The impression I got was that the console lets you choose the number of cpu/mem/disk, so the flavor is created on demand for that).

Idiosyncracies of OpenStack clouds aside, I think the other details are relatively straightforward.
The name "Debian 11 Bullseye x86_64" comes from running openstack images list.
(An easy way to get the openstack client is to run nix shell nixpkgs#openstackclient).

Looking at the user data passed to the VM…

#cloud-config
system_info:
 default_user:
  name: ${var.default_user_name}
chpasswd: { expire: false }
ssh_pwauth: false
package_upgrade: true
manage_etc_hosts: localhost
runcmd:
  - "curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install --no-confirm"

…this uses cloud-init, which declares some things we want set up in the VM.
The curl ... | sh installs the determinate-systems nix installer.
This means the VM will have nix available shortly after the VM launches. (Running the command cloud-init status on the launched VM shows whether cloud-init has finished).

Running the Terraform file involves (as is usual):

terraform apply

(and removing these resources with terraform destroy).

The public IP is an output, which allows SSH’ing into the VM with a command like:

ssh debian@(terraform output -json public_ipv4 | jq -r)

Building the NixOS Image for OpenStack

This is the flake.nix file. Below, some implementation notes, and the commands for building/uploading.

{
  inputs = {
    nixos-generators = {
      url = "github:nix-community/nixos-generators";
      inputs.nixpkgs.follows = "nixpkgs";
    };
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-22.11";
    rgoulter.url = "github:rgoulter/nix-user-repository";
  };

  outputs = {
    self,
    nixos-generators,
    nixpkgs,
    rgoulter,
    ...
  }: {
    packages."x86_64-linux".small-openstack = nixos-generators.nixosGenerate {
      pkgs = nixpkgs.legacyPackages."x86_64-linux";
      format = "openstack";
      modules = [
        rgoulter.nixosModules.cloud-interactive
        rgoulter.nixosModules.ssh
        rgoulter.nixosModules.ssh-users
        rgoulter.nixosModules.tailscale
      ];
    };
  };
}

Recall, a flake.nix file is more/less equivalent to a package.json/Cargo.toml/etc. project file, and is a standard entry point into a Nix codebase.

Here, the OpenStack image is declared as a package, using nixos-generators and its nixosGenerate package. (The OpenStack specific part is the format = "openstack"; attribute. As mentioned above, you can dig into the details in nixos/, which nixos-generators makes use of).

The rgoulter.nixosModules refers to the modules in my nix-user-repository. (Though, you could just inline these all as one module in this flake.nix file, etc.).

e.g. cloud-interactive ensures that some CLI tools I like are installed:

{
  config,
  lib,
  pkgs,
  ...
}: {
  environment.systemPackages = with pkgs; [
    direnv
    fd
    fish
    git
    helix
    jq
    ripgrep
    starship
    tmux
  ];
  nix = {
    extraOptions = ''
      experimental-features = nix-command flakes
    '';
  };
  security.sudo.wheelNeedsPassword = false;
}

and ssh-users declares the user I log in with:

{
  config,
  lib,
  pkgs,
  ...
}:
{
  users.users.rgoulter = {
    isNormalUser = true;
    extraGroups = [
      "wheel"
    ];
    openssh.authorizedKeys.keys = [
      "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOmQ9/u9qV9Vvy2pbcPtGiAmIrhXdi/vY6IesJ5RYpS4"
    ];
  };
}

(This doesn’t necessarily need to be hard coded into the image; but, for this use case, doing it this way is simple).

With this flake.nix, on a Linux computer with nix installed (and flakes, nix-command enabled, as the determinate-systems does), the image is built by running a command like:

nix build .#small-openstack

(A quick & dirty approach is to just copy the flake.nix file to the Linux VM we’re running. But, it’d also be possible to have a nixosGenerate package in a flake.nix in a repository else & refer to it using the appropriate flake URI).

The resulting file is linked to by ./result/nixos.qcow2.

Uploading the Image

With the openstack client (use nix shell nixpkgs#openstackclient to make it available to the shell), and appropriate openstack credentials (e.g. copying the openstack user RC file over and source’ing it), the built image can be uploaded with:

openstack image create \
  --private \
  --disk-format qcow2 \
  --container-format bare \
  --file ./result/nixos.qcow2 \
  my-nixos

which creates the OpenStack image my-nixos.

Launching a VM with Our Private NixOS Image

This is straightforward, by re-using the code from “Using Terraform to Launch a VM”.

Simply change the name of the image in the data.openstack_images_image_v2 block to "my-nixos". (e.g. change the value of the Terraform variable image_name). – The NixOS image ignores the user data.

Switching Configuration After Launch

There are uses cases where it makes sense to ‘switch’ the NixOS configuration after the VM has launched.

There are plenty of options for doing this.

I liked using a command like:

ssh -A "rgoulter@${IP}" -- " \
  sudo sh -c 'mkdir -p ~/.ssh && \
  chmod 700 ~/.ssh && \
  ssh-keyscan github.com >> ~/.ssh/known_hosts && \
  echo switch to ${FLAKE_URI} && \
  nixos-rebuild switch --flake ${FLAKE_URI}' \
"

I think another option instead of doing the ssh-keyscan here is to add it to services.openssh.knownHosts.

Newer post Older post