
> **Disclaimer:** The following text is an AI-generated log and summary of a
> session deploying Cozystack on a Talos cluster. All work was done in a
> background manner by `claude-code` and `Swamp`, and I only had to approve some
> steps here and there and add a hint about the ISO boot order issue.

## What Worked Well with Swamp

1. Extension models are powerful — The @talos/node model with applyConfig,
   bootstrap, health, patchConfig, reboot methods made the entire Talos
   lifecycle manageable. Adding patchConfig and retry logic was straightforward.
2. Model methods for step-by-step execution — Running swamp model method run
   tapok-cp-1 applyConfig --input '{...}' was reliable and gave clear JSON
   output with success/failure status.
3. libvirt models — unraid-vms and unraid-storage worked well for VM management
   (start/stop/resize/attach-disk) and storage pool/volume management.
4. Retry logic in talosctl helper — The isTransientError() pattern with
   configurable retries saved the bootstrap phase (which needed ~20 retries over
   5 minutes).

## Issues / Areas for Improvement

1. Workflow idempotency is hard — The full workflow failed repeatedly because: -
   start fails if VM is already running - poolDefine fails if pool exists -
   volCreate fails if volume exists - Required allowFailure: true + completed
   conditions everywhere, making the YAML verbose
2. Workflow can't resume from a specific job — After the first run succeeded
   through job 5 but failed on job 6, we couldn't skip the completed jobs. Had
   to create a separate phase2 workflow.
3. ISO boot order issue — The biggest time sink. After Talos installed to disk,
   the cdrom ISO was still first in boot order. After stop/start, VMs booted the
   ISO instead of the installed disk, causing all nodes to be unreachable. Fix:
   detach ISO after first successful boot.
4. virsh setvcpus --maximum missing — Had to add the maximum parameter to the
   libvirt model mid-session.
5. virsh attach-disk --persistent vs --config — --persistent only works on
   running VMs, --config for stopped VMs. Had to add the config parameter.
6. Cozystack ConfigMap bundle naming — The docs say paas-full but v1.1.0 uses
   isp-full, default, etc. Also needed to manually create a Package CR — the
   operator doesn't auto-create it from the ConfigMap.

## Step-by-Step Deployment Guide

1. WIPE & BOOT
   - Stop all VMs: virsh destroy tapok-*
   - Wipe boot disks: qemu-img create -f qcow2 <path> 10G
   - Start VMs (boot from Talos ISO): virsh start tapok-*
   - Wait for port 50000 (maintenance mode)

2. PROVISION TALOS
   - Apply controlplane configs (insecure): talosctl apply-config --insecure
     --file controlplane.yaml
   - Apply worker configs (insecure): talosctl apply-config --insecure --file
     worker.yaml
   - Wait for nodes to install and reboot

3. BOOTSTRAP
   - Bootstrap etcd on cp-1: talosctl bootstrap
   - Wait for cluster health: talosctl health --wait-timeout 10m

4. DRBD EXTENSION
   - Patch all nodes with drbd-patch.yaml: talosctl patch machineconfig
     --patch-file drbd-patch.yaml
   - Rolling reboot: reboot one node, wait for health, repeat

5. DETACH ISO (critical!)
   - Stop all VMs
   - Detach cdrom: virsh detach-disk <vm> sda --config

6. ATTACH LINSTOR STORAGE
   - Create storage pool: virsh pool-define-as / pool-build / pool-start
   - Create 100G qcow2 volumes per node
   - Attach as vdb: virsh attach-disk --config

7. START VMs
   - Start all VMs (now boot from disk, no ISO)
   - Wait for cluster health

8. INSTALL COZYSTACK
   - helm upgrade --install cozystack
     oci://ghcr.io/cozystack/cozystack/cozy-installer --namespace cozy-system
     --create-namespace
   - Apply platform ConfigMap (bundle-name: isp-full)
   - Create Package CR: kubectl apply -f package.yaml (name must match
     PackageSource)
   - Wait for Cilium → cert-manager → dashboard chain

9. ACCESS DASHBOARD
   - https://dashboard.<root-host> (after ingress/metallb/cert-manager are
     ready)
