Portable linux via Sbnb distro with persistence

***** EXTREME CAUTION: IRREVERSIBLE DATA DESTRUCTION IMMINENT! *****

This guide provides comprehensive, step-by-step instructions for configuring a single USB flash drive (or potentially an external USB hard drive) to perform two distinct functions simultaneously:

Booting the Sbnb Linux Operating System: The drive will be prepared with a standard UEFI-compatible structure, specifically an EFI System Partition (ESP) containing the Sbnb EFI bootloader (sbnb.efi) and necessary configuration files. This allows the server’s firmware to locate and start the Sbnb boot process. The sbnb.efi file itself is typically a Unified Kernel Image (UKI), bundling the Linux kernel, initramfs, and kernel command line into a single executable file.
Providing Simple Persistent Storage: Utilizing a separate partition on the same physical USB drive, formatted with a standard Linux filesystem (ext4 is used in this guide). This partition is intended to be automatically mounted at the /mnt/sbnb-data directory path within the running Sbnb Linux system via a custom boot script (sbnb-cmds.sh). This provides a space where data (like container volumes, application data, logs, user files) can persist across reboots of the otherwise ephemeral, RAM-based Sbnb OS.

Why ext4 instead of LVM: Initial analysis suggested LVM might be suitable, but further review of the default Sbnb Linux build configuration indicates the necessary lvm2 user-space tools are likely missing from the base runtime environment. Without these tools, managing LVM volumes during boot via standard scripts is infeasible unless you create a custom Sbnb build that includes the lvm2 package. This revised guide therefore uses a standard ext4 filesystem partition, relying only on basic tools expected to be present in Sbnb.

Contrasting with Standard Sbnb Workflow: It’s crucial to understand that this guide describes a highly non-standard setup. The intended Sbnb workflow prioritizes resilience, performance, and statelessness:

Boot the minimal Sbnb OS from simple USB/network.
Use automation (Ansible) or manual scripts (sbnb-configure-storage.sh) post-boot to configure LVM on internal server drives.
Run workloads utilizing this fast, reliable internal storage. This guide’s method compromises these benefits for single-drive convenience under specific constraints.

#***** EXTREME CAUTION: IRREVERSIBLE DATA DESTRUCTION IMMINENT! *****

This procedure involves low-level disk operations (partitioning, formatting) that will completely and PERMANENTLY ERASE ALL DATA currently residing on the USB drive you select. There is NO UNDO function. Data recovery after accidental formatting is often impossible.

The most critical risk is selecting the wrong target device. Mistakenly choosing your computer’s internal hard drive (e.g., /dev/sda, /dev/nvme0n1) instead of the intended USB drive (e.g., /dev/sdb, /dev/sdc) WILL RESULT IN CATASTROPHIC AND LIKELY IRRECOVERABLE LOSS OF YOUR OPERATING SYSTEM, APPLICATIONS, AND PERSONAL FILES.

You MUST verify the target device name multiple times using different commands (like lsblk, fdisk, parted) and cross-reference with expected drive sizes and models before executing any partitioning or formatting commands. Proceed with extreme vigilance, double-checking each step, entirely at your own sole risk!

#Primary Drawbacks & Warnings (Reiterated & Expanded):

Highly Non-Standard & Complex: Deviates significantly from Sbnb’s design. Setup is intricate, runtime behavior depends on precise script execution and timing. Future Sbnb updates might break this.
Severe Performance Penalty: USB storage is inherently slow (latency, throughput, IOPS) compared to internal NVMe/SATA drives. Disk I/O to /mnt/sbnb-data will be a major bottleneck.
Drastically Reduced Lifespan & Reliability: USB flash drives will wear out quickly under persistent write load due to limited write cycles, write amplification, and lack of TRIM support. Unsuitable for write-intensive workloads or high reliability needs. Expect eventual failure and data loss without robust backups.
Potential Instability & Boot Issues: Relies on correct partition detection, udev node creation, filesystem integrity, and sbnb-cmds.sh execution timing. Failures can leave persistent storage unavailable.

#When Might This Be Considered? (Limited Scenarios with Full Risk Acceptance)

Temporary Testing/Experimentation ONLY: Brief evaluations on hardware lacking internal drives.
Specific, Very Low-Intensity, Read-Mostly Use Cases: Infrequent writes, performance irrelevant (e.g., static config kiosk).
Absolute Hardware Constraints: Sealed systems where internal drives are impossible, and risks are fully accepted.

Even in these limited scenarios, regular, automated, and verified backups are non-negotiable.

#Prerequisites

A Suitable USB Flash Drive:
- Capacity: Min ~1GB ESP + desired data size (32GB+ recommended).
- Quality & Speed: Reputable brand, USB 3.0+ advised for marginal speed benefit. Endurance matters more than peak speed.
A Working Linux System (Preparation Environment):
- Necessity: Required for partitioning/formatting the target USB safely. openSUSE Tumbleweed assumed.
- Live Environment Benefit: Using a Live USB/CD (e.g., openSUSE Tumbleweed Live) is highly recommended as it provides a non-destructive environment.
Sbnb Linux Boot File (sbnb.efi):
- Method 1 (Easier): Run official Sbnb install script on a temporary USB, then copy /EFI/BOOT/BOOTX64.EFI from its ESP.
- Method 2 (Advanced): Build Sbnb from source, find sbnb.efi in output/images/.
Root/Sudo Privileges: Needed on the openSUSE prep system for disk commands.
Internet Connection: May be needed for zypper.

#Step-by-Step Instructions

(Reminder: TRIPLE-CHECK your target device name, e.g., /dev/sdX, before every destructive command!)

#Phase 1: Prepare the Linux Environment (openSUSE Tumbleweed)

Boot into openSUSE: Start your preparation environment.
Install Necessary Tools: Open a terminal. zypper refresh updates package lists. zypper install installs tools.
sudo zypper refresh sudo zypper install -y parted lvm2 dosfstools e2fsprogs
Identify Target USB Drive: CRITICAL SAFETY STEP! Unplug other USB storage.
- Insert the target USB drive.
- Use multiple commands. Compare SIZE and MODEL. Check dmesg | tail after plugging in for kernel messages like sd 2:0:0:0: [sdc] Attached SCSI removable disk.
  lsblk -d -o NAME,SIZE,MODEL,VENDOR,TYPE | grep 'disk' sudo fdisk -l | grep '^Disk /dev/' sudo parted -l | grep '^Disk /dev/' # Example: If consistently identified as /dev/sdc, use /dev/sdc below.
- Visually confirm with YaST Partitioner (sudo yast2 partitioner) or GParted (sudo zypper install -y gparted && sudo gparted) if preferred. Look for the drive matching the expected size and vendor/model.
- Assume /dev/sdX is your verified target drive. Replace it carefully!

#Phase 2: Partition the USB Drive

(Warning: The following parted commands are DESTRUCTIVE to /dev/sdX. Double-check the device name!)

This script automates the partitioning and formatting process. Save it as prepare_usb.sh, make it executable (chmod +x prepare_usb.sh), and run it with sudo ./prepare_usb.sh /dev/sdX (replacing /dev/sdX with your verified target device).

#!/bin/bash

# --- Configuration ---
# Exit immediately if a command exits with a non-zero status.
# Treat unset variables as an error when substituting.
# Pipelines return the exit status of the last command to exit non-zero.
set -euo pipefail

# --- Variables ---
# EFI System Partition (ESP) Label (CRITICAL - must match bootloader config)
ESP_LABEL="sbnb"
# Data Partition Label (Recommended for identification)
DATA_LABEL="SBNB_DATA"
# ESP Size (Adjust if needed, ~1GB is usually sufficient)
ESP_SIZE="1025MiB"
# List of required commands for the script to function
REQUIRED_CMDS=(
  "parted" "mkfs.vfat" "mkfs.ext4" "wipefs" "findmnt" "lsblk"
  "blkid" "fsck.vfat" "e2fsck" "sync" "id" "grep" "read"
  "sleep" "xargs" "umount" "partprobe" "realpath"
)

# --- Functions ---
# Function to check for required commands
check_dependencies() {
  echo "--- Checking for required commands ---"
  local missing_cmds=()
  for cmd in "${REQUIRED_CMDS[@]}"; do
    if ! command -v "$cmd" &> /dev/null; then
      missing_cmds+=("$cmd")
    fi
  done

  if [ ${#missing_cmds[@]} -ne 0 ]; then
    echo "ERROR: The following required commands are not found:" >&2
    printf " - %s\n" "${missing_cmds[@]}" >&2
    echo "Please install them and try again." >&2
    exit 1
  fi
  echo "All required commands found."
}

# Function to get the base block device for a given path (handles partitions, links, etc.)
get_base_device() {
  local path="$1"
  local resolved_path
  resolved_path=$(realpath "$path") || { echo "ERROR: Cannot resolve path '$path'" >&2; return 1; }
  # lsblk -no pkname gets the parent kernel name (base device)
  lsblk -no pkname "$resolved_path" || { echo "ERROR: Cannot find base device for '$resolved_path' using lsblk." >&2; return 1; }
}

# --- Script Start ---
echo "-----------------------------------------------------"
echo "--- USB Drive Partitioning and Formatting Script ---"
echo "---  (Version 2 - Enhanced Safety)      ---"
echo "-----------------------------------------------------"
echo ""
echo "WARNING: This script is DESTRUCTIVE and will ERASE"
echo " ALL DATA on the target device."
echo ""

# --- Check for Root Privileges ---
if [ "$(id -u)" -ne 0 ]; then
  echo "ERROR: This script must be run as root (e.g., using sudo)." >&2
  exit 1
fi

# --- Check Dependencies ---
check_dependencies

# --- Check for Device Argument ---
if [ -z "${1:-}" ]; then
  echo "Usage: $0 /dev/sdX"
  echo "ERROR: Please provide the target block device (e.g., /dev/sda, /dev/sdb)." >&2
  echo ""
  echo "Available block devices (excluding ROM, loop, and RAM devices):"
  lsblk -d -o NAME,SIZE,TYPE,MODEL | grep -vE 'rom|loop|ram'
  exit 1
fi

DEVICE="$1"

# --- Validate Device ---
if [ ! -b "$DEVICE" ]; then
  echo "ERROR: '$DEVICE' is not a valid block device." >&2
  exit 1
fi

# --- CRITICAL SAFETY CHECK: Prevent targeting the root filesystem device ---
echo "--- Performing safety checks ---"
ROOT_DEV_PATH=$(findmnt -n -o SOURCE /)
ROOT_BASE_DEV_NAME=$(get_base_device "$ROOT_DEV_PATH") || exit 1 # Exit if function fails
TARGET_BASE_DEV_NAME=$(get_base_device "$DEVICE") || exit 1

# Construct full device paths for comparison
ROOT_BASE_DEV="/dev/${ROOT_BASE_DEV_NAME}"
TARGET_BASE_DEV="/dev/${TARGET_BASE_DEV_NAME}" # Assumes the input $DEVICE is the base device

if [ "$TARGET_BASE_DEV" == "$ROOT_BASE_DEV" ]; then
  echo "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" >&2
  echo "FATAL ERROR: Target device '$DEVICE' appears to be the same" >&2
  echo " device ('$ROOT_BASE_DEV') as the running root" >&2
  echo " filesystem ('$ROOT_DEV_PATH')." >&2
  echo " Aborting to prevent data loss." >&2
  echo "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" >&2
  exit 1
fi
echo "Safety check passed: Target device '$DEVICE' is not the root filesystem device ('$ROOT_BASE_DEV')."

# Check if the device looks like an SD card reader often used for the OS drive
if [[ "$DEVICE" == /dev/mmcblk* ]]; then
  echo "WARNING: '$DEVICE' looks like an SD card (e.g., /dev/mmcblk0)."
  echo " Double-check this is not your primary OS drive!"
fi


# --- Confirmation ---
echo ""
echo "Target Device: $DEVICE"
echo "Partitions to be created:"
echo "  1: EFI System Partition (ESP), FAT32, Label: '$ESP_LABEL', Size: $ESP_SIZE, Flags: boot, esp"
echo "  2: Linux Data Partition, ext4, Label: '$DATA_LABEL', Size: Remaining space"
echo ""
read -p "ARE YOU ABSOLUTELY SURE you want to erase '$DEVICE' and proceed? (yes/NO): " CONFIRMATION
CONFIRMATION=${CONFIRMATION:-NO} # Default to NO if user just presses Enter

if [[ "$CONFIRMATION" != "yes" ]]; then
  echo "Operation cancelled by user."
  exit 0
fi

echo ""
echo "--- Proceeding with operations on $DEVICE ---"

# --- Phase 2: Partition the USB Drive ---

# 1. Unmount Existing Partitions
echo ""
echo "--- Unmounting any existing partitions on ${DEVICE}* ---"
# Use findmnt to get mount points and umount them safely
# Also try to unmount the base device itself in case it's loop-mounted etc.
findmnt -n -o TARGET --source "${DEVICE}*" | xargs --no-run-if-empty umount -v -l || echo "Info: No partitions were mounted or umount failed (might be okay)."
umount "$DEVICE" &>/dev/null || true # Attempt to unmount base device, ignore errors
sleep 1 # Give time for umount to settle
lsblk "$DEVICE"

# 2. Wipe Existing Signatures (Recommended)
echo ""
echo "--- Wiping filesystem/partition signatures from $DEVICE ---"
wipefs --all --force "$DEVICE"
sync # Flush kernel buffers to disk to ensure changes are physically written

# 3. Create New GPT Partition Table
echo ""
echo "--- Creating new GPT partition table on $DEVICE ---"
parted "$DEVICE" --script -- mklabel gpt
sync # Flush kernel buffers to disk

# 4. Create EFI System Partition (ESP)
echo ""
echo "--- Creating ESP partition (1) on $DEVICE ---"
parted "$DEVICE" --script -- mkpart "${ESP_LABEL}" fat32 1MiB "${ESP_SIZE}"
parted "$DEVICE" --script -- set 1 boot on
parted "$DEVICE" --script -- set 1 esp on
sync # Flush kernel buffers to disk

# 5. Create Linux Data Partition
echo ""
echo "--- Creating Linux data partition (2) on $DEVICE ---"
# Use the end of the ESP as the start for the data partition
parted "$DEVICE" --script -- mkpart "${DATA_LABEL}" ext4 "${ESP_SIZE}" 100%
sync # Flush kernel buffers to disk
echo "Waiting briefly for kernel to recognize new partitions..."
sleep 2

# Define partition variables (assuming standard naming, e.g., /dev/sda1, /dev/sda2)
# Adding 'p' for NVMe devices (e.g., /dev/nvme0n1p1) - check if base device name contains 'nvme'
if [[ "$DEVICE" == *nvme* ]]; then
  PART_PREFIX="p"
else
  PART_PREFIX=""
fi
ESP_PARTITION="${DEVICE}${PART_PREFIX}1"
DATA_PARTITION="${DEVICE}${PART_PREFIX}2"

# Check if partition devices exist, retry with partprobe if needed
echo "--- Checking for partition device nodes (${ESP_PARTITION}, ${DATA_PARTITION}) ---"
PARTITIONS_FOUND=false
for i in {1..5}; do
  if [ -b "$ESP_PARTITION" ] && [ -b "$DATA_PARTITION" ]; then
    echo "Partition nodes found."
    PARTITIONS_FOUND=true
    break
  fi
  echo "Partition nodes not yet found. Retrying probe (Attempt $i/5)..."
  partprobe "$DEVICE" || echo "Warning: partprobe command failed, continuing check..."
  sleep 1
done

if [ "$PARTITIONS_FOUND" = false ]; then
  echo "ERROR: Partition devices ($ESP_PARTITION, $DATA_PARTITION) not found after partitioning and retries." >&2
  echo "    Please check manually ('lsblk $DEVICE', 'parted $DEVICE print')." >&2
  lsblk "$DEVICE"
  exit 1
fi

# 6. Verify Partitioning
echo ""
echo "--- Verifying partitions on $DEVICE ---"
parted "$DEVICE" --script -- print
echo ""
echo "--- Block device view: ---"
lsblk -o NAME,SIZE,TYPE,FSTYPE,PARTLABEL,MOUNTPOINT,PARTFLAGS "$DEVICE"
echo "----------------------------"
echo "Expected: ${ESP_PARTITION} (~${ESP_SIZE}), Type EFI System, Flags: boot, esp"
echo "Expected: ${DATA_PARTITION} (Remaining size), Type Linux filesystem"
echo "----------------------------"
sleep 2 # Pause for user to review


# --- Phase 3: Format Filesystems ---

# 1. Format EFI Partition
echo ""
echo "--- Formatting ESP partition (${ESP_PARTITION}) as FAT32 with label '${ESP_LABEL}' ---"
mkfs.vfat -F 32 -n "${ESP_LABEL}" "${ESP_PARTITION}"
sync # Flush kernel buffers to disk

# Check filesystem integrity
echo "--- Checking ESP filesystem (fsck.vfat) ---"
FSCK_VFAT_EXIT_CODE=0
fsck.vfat -a "${ESP_PARTITION}" || FSCK_VFAT_EXIT_CODE=$? # Run fsck, capture exit code on failure

if [ $FSCK_VFAT_EXIT_CODE -eq 0 ]; then
  echo "ESP filesystem check passed (or no check performed)."
elif [ $FSCK_VFAT_EXIT_CODE -eq 1 ]; then
  # Exit code 1 usually means errors were found AND corrected.
  echo "WARNING: fsck.vfat found and corrected errors on ESP partition (${ESP_PARTITION}). Check output above."
else
  # Exit codes > 1 typically indicate uncorrected errors.
  echo "ERROR: fsck.vfat reported uncorrectable errors (Exit Code: $FSCK_VFAT_EXIT_CODE) on ESP partition (${ESP_PARTITION})." >&2
  echo "    Cannot proceed safely. Please investigate manually." >&2
  exit 1
fi

# Verify label using blkid
echo "--- Verifying ESP label ---"
if blkid -s LABEL -o value "${ESP_PARTITION}" | grep -q "^${ESP_LABEL}$"; then
  echo "ESP Label '${ESP_LABEL}' verified successfully on ${ESP_PARTITION}."
else
  echo "ERROR: Failed to verify ESP Label '${ESP_LABEL}' on ${ESP_PARTITION}." >&2
  blkid "${ESP_PARTITION}" # Show full blkid output for debugging
  exit 1
fi

# 2. Format Data Partition
echo ""
echo "--- Formatting Data partition (${DATA_PARTITION}) as ext4 with label '${DATA_LABEL}' ---"
mkfs.ext4 -m 0 -L "${DATA_LABEL}" "${DATA_PARTITION}"
sync # Flush kernel buffers to disk

# Check the new ext4 filesystem integrity
echo "--- Checking Data partition filesystem (e2fsck) ---"
# -f forces check even if clean, -y assumes yes to all prompts (use with caution)
E2FSCK_EXIT_CODE=0
e2fsck -f -y "${DATA_PARTITION}" || E2FSCK_EXIT_CODE=$? # Capture exit code on failure

if [ $E2FSCK_EXIT_CODE -eq 0 ]; then
  echo "Data partition filesystem check passed."
elif [ $E2FSCK_EXIT_CODE -eq 1 ]; then
  # Exit code 1 means errors were corrected.
  echo "WARNING: e2fsck found and corrected errors on Data partition (${DATA_PARTITION}). Check output above."
else
  # Exit codes > 1 indicate uncorrected errors.
  echo "ERROR: e2fsck reported uncorrectable errors (Exit Code: $E2FSCK_EXIT_CODE) on Data partition (${DATA_PARTITION})." >&2
  echo "    Cannot proceed safely. Please investigate manually." >&2
  exit 1
fi

# Verify the label using blkid
echo "--- Verifying Data partition label ---"
if blkid -s LABEL -o value "${DATA_PARTITION}" | grep -q "^${DATA_LABEL}$"; then
  echo "Data Label '${DATA_LABEL}' verified successfully on ${DATA_PARTITION}."
else
  echo "ERROR: Failed to verify Data Label '${DATA_LABEL}' on ${DATA_PARTITION}." >&2
  blkid "${DATA_PARTITION}" # Show full blkid output for debugging
  exit 1
fi

echo ""
echo "-----------------------------------------------------"
echo "--- Script finished successfully! ---"
echo "Device: $DEVICE"
echo "Partitions created and formatted:"
lsblk -o NAME,SIZE,TYPE,FSTYPE,LABEL,PARTLABEL,MOUNTPOINT "$DEVICE"
echo "-----------------------------------------------------"

exit 0

#Phase 3: Install Sbnb Boot Files and Configuration

Mount EFI Partition: Access the ESP filesystem. Replace /dev/sdX1 with the actual ESP partition device name identified earlier.
echo "--- Mounting ESP partition ---" sudo mkdir -p /mnt/sbnb-mount sudo mount /dev/sdX1 /mnt/sbnb-mount
Create EFI Boot Directory: Standard UEFI fallback path.
echo "--- Creating EFI boot directories ---" sudo mkdir -p /mnt/sbnb-mount/EFI/BOOT
Copy Sbnb EFI Boot File: Place the bootloader (sbnb.efi as BOOTX64.EFI). Replace /path/to/your/sbnb.efi with the actual path to the file you obtained.
echo "--- Copying Sbnb EFI boot file ---" sudo cp /path/to/your/sbnb.efi /mnt/sbnb-mount/EFI/BOOT/BOOTX64.EFI
Run Sbnb Configuration python script: Mount /dev/sdX1 to /mnt/sbnb and /dev/sdX2 to /mnt/sbnb-data. Replace tskey-auth-... with your actual Tailscale auth key on this python script:

#!/usr/bin/env python3
"""
Unified SBNB Configuration Deployment Script (Version 2.1 - BusyBox cp focus).

Generates configuration files and scripts to:
- Mount a persistent data partition.
- Configure Docker to use a persistent data-root on that partition.
- Optionally migrate existing Docker data from /var/lib/docker robustly using busybox cp.
- Set up backup/purge routines for the persistent Docker data.
- Set up health and volume monitoring for Docker (with safer defaults).
- Deploy a Tailscale authentication key.
- Deploy an optional development environment script.

Core components generated:
- /mnt/sbnb/sbnb-cmds.sh: Main boot script executed by the system.
- /mnt/sbnb/sbnb-tskey.txt: Tailscale authentication key.
- /mnt/sbnb-data/scripts/*: Helper scripts for backup, purge, health checks.
- /mnt/sbnb-data/systemd/*: Systemd units to automate helper scripts.

Prerequisites:
- Run as root.
- ESP partition mounted at /mnt/sbnb (writable).
- Data partition mounted at /mnt/sbnb-data (writable).
- Required: Standard Linux utilities (coreutils including 'cp', systemd, grep, sed, etc.).
- Recommended: `jq` installed on the target system for robust JSON handling.
"""

import os
import stat
import sys
import pathlib
import json
import shutil
from datetime import datetime

# --- Configuration: File Paths ---
# Base mount points - Script will check if these exist and are writable
ESP_MOUNT = "/mnt/sbnb"
DATA_MOUNT = "/mnt/sbnb-data"

# Docker configuration
PERSISTENT_DOCKER_ROOT = f"{DATA_MOUNT}/docker-root"
DOCKER_CONFIG_DIR = "/etc/docker"
DOCKER_CONFIG_FILE = f"{DOCKER_CONFIG_DIR}/daemon.json"
DOCKER_CONFIG_BACKUP_SUFFIX = ".sbnb-orig-backup" # Suffix for one-time backup
DOCKER_DATA_EPHEMERAL = "/var/lib/docker" # Default path for migration check
# Permissions for Docker root dir (rwx--x--x). Owner/Group should be root:root.
# Use standard integer representation for octal in Python
DOCKER_ROOT_PERMISSIONS = 0o711
# Permissions for daemon.json (rw-r--r--)
DOCKER_CONFIG_PERMISSIONS = 0o644

# Backup configuration
BACKUP_BASE_DIR = f"{DATA_MOUNT}/backups/docker"
BACKUP_KEEP_COUNT = 3 # Number of backups to retain
STOP_DOCKER_FOR_BACKUP = 1 # 1 = Stop Docker during backup (safer), 0 = Attempt live backup
# Permissions for backup base directory (rwxr-x---)
BACKUP_DIR_PERMISSIONS = 0o750

# Health Check configuration
VOLUME_CHECK_THRESHOLD_PERCENT = 10 # Warn if free space drops below this %
# Pruning level in volume check: 0=None, 1=Containers/Dangling Images, 2=All Unused Images+Containers (--volumes still excluded)
VOLUME_CHECK_PRUNE_LEVEL = 1

# --- Content Definitions ---

# --- sbnb-cmds.sh Content ---
# REFACTOR: Removed rsync checks and usage, standardized on cp -a -u for migration.
# REFACTOR: Use correct octal format specifier (:o) for mkdir -m and chmod.
SBNB_CMDS_SH_CONTENT = f"""#!/bin/sh
# Sbnb Custom Commands Script (Unified Persistent Docker Root + Features - v2.1 - BusyBox cp)
# Mounts persistent data, configures Docker data-root, migrates data (if needed using cp),
# updates optional scripts, enables systemd units for backup & monitoring.

# Strict error handling
set -e -o pipefail -u

# --- Logging Function ---
log() {{
    # Log to kernel message buffer
    echo "[sbnb-cmds.sh] $1" > /dev/kmsg
}}

log "Starting custom boot commands (Unified Persistent Docker Root v2.1 - BusyBox cp)..."

# --- Check Core Commands ---
# Ensure essential commands for this script are present
check_cmds() {{
    local missing_cmd=0
    log "Checking required commands..."
    for cmd in "$@"; do
        if ! command -v "$cmd" >/dev/null 2>&1; then
            log "ERROR: Required command '$cmd' not found."
            missing_cmd=1
        fi
    done
    if [ $missing_cmd -eq 1 ]; then
        log "ERROR: Missing one or more required commands. Cannot proceed."
        exit 1
    fi
    log "Required commands found."
    # Check optional but recommended commands
    if ! command -v jq >/dev/null 2>&1; then
        log "WARNING: 'jq' command not found. JSON handling for daemon.json will be less robust and may fail on complex existing files."
    else
        log "OK: 'jq' command found (recommended)."
    fi
    # Note: rsync check removed as cp -a -u is now the standard method
}}
# Define all commands potentially used in this script
# Removed 'rsync' from the list.
check_cmds mountpoint readlink mkdir mount echo sleep rm find ln systemctl mktemp cp mv chmod chown dirname basename jq grep cat cmp date sed ls

# --- Mount Persistent Data Partition ---
DATA_LABEL="SBNB_DATA"
DATA_DEVICE_SYMLINK="/dev/disk/by-label/${{DATA_LABEL}}"
DATA_MOUNT_POINT="{DATA_MOUNT}"
MAX_WAIT_SECONDS=15
WAIT_INTERVAL=1
elapsed_time=0

log "Waiting up to ${{MAX_WAIT_SECONDS}}s for data device (Label: ${{DATA_LABEL}})..."
while [ ! -e "${{DATA_DEVICE_SYMLINK}}" ]; do
    if [ ${{elapsed_time}} -ge ${{MAX_WAIT_SECONDS}} ]; then
        log "ERROR: Timeout waiting for device ${{DATA_DEVICE_SYMLINK}}. Persistent data cannot be mounted."
        exit 1
    fi
    sleep ${{WAIT_INTERVAL}}
    elapsed_time=$((elapsed_time + WAIT_INTERVAL))
done
DATA_DEVICE=$(readlink -f "${{DATA_DEVICE_SYMLINK}}")
log "Data partition device resolved to ${{DATA_DEVICE}} after ${{elapsed_time}}s."

# Ensure mount point directory exists
mkdir -p "${{DATA_MOUNT_POINT}}"

log "Attempting to mount ${{DATA_DEVICE}} at ${{DATA_MOUNT_POINT}}..."
if ! mountpoint -q "${{DATA_MOUNT_POINT}}"; then
    # Attempt to mount read-write, noatime, nodiratime
    if mount -o rw,noatime,nodiratime "${{DATA_DEVICE}}" "${{DATA_MOUNT_POINT}}"; then
        log "Successfully mounted persistent partition at ${{DATA_MOUNT_POINT}}."
    else
        log "ERROR: Failed to mount ${{DATA_DEVICE}} at ${{DATA_MOUNT_POINT}}! Check filesystem and device."
        exit 1
    fi
else
    log "Persistent partition already mounted at ${{DATA_MOUNT_POINT}}. Ensuring read-write..."
    # Ensure partition is mounted read-write
    mount -o remount,rw "${{DATA_MOUNT_POINT}}" || {{
        log "ERROR: Failed to remount ${{DATA_MOUNT_POINT}} as read-write! Docker requires write access."
        exit 1
    }}
fi

# --- Configure Docker to use Persistent Data Directory ---
log "Setting up Docker to use persistent data-root..."

PERSISTENT_DOCKER_ROOT="{PERSISTENT_DOCKER_ROOT}"
DOCKER_CONFIG_DIR="{DOCKER_CONFIG_DIR}"
DOCKER_CONFIG_FILE="{DOCKER_CONFIG_FILE}"
DOCKER_CONFIG_BACKUP="{DOCKER_CONFIG_FILE}{DOCKER_CONFIG_BACKUP_SUFFIX}"
DOCKER_DATA_EPHEMERAL="{DOCKER_DATA_EPHEMERAL}" # For migration check
CONFIG_CHANGED=0 # Flag to track if we need to restart docker

# 1. Ensure the persistent Docker data-root directory exists with correct owner/permissions
log "Ensuring persistent Docker data directory exists: ${{PERSISTENT_DOCKER_ROOT}}"
# Create with specific permissions (rwx--x--x) using correct octal format for command line
mkdir -p -m {DOCKER_ROOT_PERMISSIONS:o} "${{PERSISTENT_DOCKER_ROOT}}"
if [ ! -d "${{PERSISTENT_DOCKER_ROOT}}" ]; then
    log "ERROR: Failed to create persistent Docker data directory ${{PERSISTENT_DOCKER_ROOT}}!"
    exit 1
fi
# Ensure ownership is root:root (critical for Docker)
log "Ensuring ownership of ${{PERSISTENT_DOCKER_ROOT}} is root:root..."
chown root:root "${{PERSISTENT_DOCKER_ROOT}}" || log "WARNING: Failed to set ownership on ${{PERSISTENT_DOCKER_ROOT}}. Docker might have issues."
# Ensure permissions are correct (mkdir -p doesn't always set mode on existing dirs) using correct octal format for command line
log "Ensuring permissions of ${{PERSISTENT_DOCKER_ROOT}} are {DOCKER_ROOT_PERMISSIONS:o}..."
chmod {DOCKER_ROOT_PERMISSIONS:o} "${{PERSISTENT_DOCKER_ROOT}}" || log "WARNING: Failed to set permissions on ${{PERSISTENT_DOCKER_ROOT}}."

log "Persistent Docker data directory ensured."

# 2. Create/Update Docker daemon configuration (/etc/docker/daemon.json)
log "Configuring Docker daemon (${{DOCKER_CONFIG_FILE}}) to use data-root: ${{PERSISTENT_DOCKER_ROOT}}"
mkdir -p "${{DOCKER_CONFIG_DIR}}" # Ensure config directory exists

# Backup original config ONCE if it exists and backup doesn't
if [ -f "${{DOCKER_CONFIG_FILE}}" ] && [ ! -f "${{DOCKER_CONFIG_BACKUP}}" ]; then
    log "Backing up original Docker config to ${{DOCKER_CONFIG_BACKUP}}..."
    cp -a "${{DOCKER_CONFIG_FILE}}" "${{DOCKER_CONFIG_BACKUP}}" || \\
        log "WARNING: Failed to create backup of ${{DOCKER_CONFIG_FILE}}."
fi

# --- Safely update daemon.json ---
NEEDS_UPDATE=0
# Use jq if available (preferred method)
if command -v jq >/dev/null 2>&1; then
    log "Using jq to manage daemon.json."
    # Ensure file exists with at least {{}} for jq processing
    [ -f "${{DOCKER_CONFIG_FILE}}" ] || echo "{{}}" > "${{DOCKER_CONFIG_FILE}}"

    # Read current value safely, defaulting to empty string if null or missing
    current_data_root=$(jq -r '.["data-root"] // ""' "${{DOCKER_CONFIG_FILE}}")

    if [ "$current_data_root" != "${{PERSISTENT_DOCKER_ROOT}}" ]; then
        log "Data-root needs update (jq check). Preparing changes..."
        NEEDS_UPDATE=1
    else
        log "Docker data-root already correctly set in daemon.json (jq check)."
    fi

    if [ $NEEDS_UPDATE -eq 1 ]; then
        TMP_JSON=$(mktemp "${{DOCKER_CONFIG_DIR}}/daemon.json.tmp.XXXXXX")
        log "Attempting to merge data-root setting using jq..."
        # Merge the new data-root value, preserving other keys
        if jq --arg path "${{PERSISTENT_DOCKER_ROOT}}" '. + {{"data-root": $path}}' "${{DOCKER_CONFIG_FILE}}" > "${{TMP_JSON}}"; then
            # Check if jq produced valid JSON
            if jq -e . "${{TMP_JSON}}" > /dev/null; then
                # Check if content actually changed before moving
                if ! cmp -s "${{TMP_JSON}}" "${{DOCKER_CONFIG_FILE}}"; then
                    mv "${{TMP_JSON}}" "${{DOCKER_CONFIG_FILE}}"
                    chmod {DOCKER_CONFIG_PERMISSIONS:o} "${{DOCKER_CONFIG_FILE}}" # Use correct format here too for consistency, though 644 doesn't need '0o' prefix
                    log "Successfully updated daemon.json using jq."
                    CONFIG_CHANGED=1
                else
                    log "daemon.json content unchanged after jq merge, removing temp file."
                    rm -f "${{TMP_JSON}}"
                fi
            else
                log "ERROR: jq produced invalid JSON output. Config not updated."
                rm -f "${{TMP_JSON}}" # Clean up temp file
            fi
        else
            jq_exit_code=$?
            log "ERROR: jq command failed (exit code $jq_exit_code) while updating config. Config not updated."
            # Optionally capture and log jq stderr here if needed
            rm -f "${{TMP_JSON}}" # Clean up temp file
        fi
    fi
# Fallback logic if jq is NOT available
else
    log "WARNING: jq not found. Using less robust fallback for daemon.json."
    # Define the minimal target content
    TARGET_JSON_CONTENT=$(printf '{{%s\\n  "data-root": "%s"%s\\n}}%s\\n' "" "${{PERSISTENT_DOCKER_ROOT}}" "" "")

    if [ ! -f "${{DOCKER_CONFIG_FILE}}" ]; then
        log "daemon.json does not exist. Creating new file with data-root."
        NEEDS_UPDATE=1
    else
        # Check if data-root key exists at all
        if ! grep -q '"data-root"\\s*:' "${{DOCKER_CONFIG_FILE}}"; then
             log "Existing daemon.json lacks 'data-root' key."
            # Check if the file is simple (e.g., just {{}} or empty/whitespace)
            if ! grep -q '[a-zA-Z0-9]' "${{DOCKER_CONFIG_FILE}}" || grep -q '^\\s*{{\\s*}}\\s*$' "${{DOCKER_CONFIG_FILE}}"; then
                log "Existing file is simple, overwriting with data-root."
                NEEDS_UPDATE=1
            else
                log "ERROR: Existing daemon.json is complex and lacks 'data-root'. Cannot safely update without jq. Install jq or manually edit."
                # Do not proceed with overwrite
                NEEDS_UPDATE=0 # Explicitly prevent update
            fi
        # Key exists, check if the value is correct (basic check)
        elif ! grep -q '"data-root"\\s*:\\s*"${{PERSISTENT_DOCKER_ROOT}}"' "${{DOCKER_CONFIG_FILE}}"; then
            log "ERROR: Existing daemon.json has 'data-root' but points elsewhere. Cannot safely update without jq. Install jq or manually edit."
            # Do not proceed with overwrite
            NEEDS_UPDATE=0 # Explicitly prevent update
        else
            log "daemon.json exists and data-root seems correct (grep check)."
            NEEDS_UPDATE=0
        fi
    fi

    # Perform write only if deemed safe and necessary by the logic above
    if [ $NEEDS_UPDATE -eq 1 ]; then
        log "Writing daemon.json (simple method)..."
        TMP_JSON=$(mktemp "${{DOCKER_CONFIG_DIR}}/daemon.json.tmp.XXXXXX")
        echo "$TARGET_JSON_CONTENT" > "${{TMP_JSON}}"
        if [ $? -eq 0 ]; then
            mv "${{TMP_JSON}}" "${{DOCKER_CONFIG_FILE}}"
            chmod {DOCKER_CONFIG_PERMISSIONS:o} "${{DOCKER_CONFIG_FILE}}" # Use correct format here too
            log "Successfully wrote simple daemon.json."
            CONFIG_CHANGED=1
        else
            log "ERROR: Failed to write temporary simple daemon.json! Config not updated."
            rm -f "${{TMP_JSON}}"
        fi
    fi
fi
log "Docker daemon configuration check finished."


# 3. Data Migration (Optional): Migrate data from ephemeral location if needed
log "Checking for existing Docker data in ephemeral location (${{DOCKER_DATA_EPHEMERAL}})..."
# Check if the directory exists and contains anything other than 'lost+found' or potential marker files
if [ -d "${{DOCKER_DATA_EPHEMERAL}}" ] && [ -n "$(ls -A "${{DOCKER_DATA_EPHEMERAL}}" | grep -v -e '^lost+found$' -e '^\\.sbnb_persistent_redirect$' -e '^README_DO_NOT_USE\\.txt$' 2>/dev/null)" ]; then
    log "Found potentially significant data in ${{DOCKER_DATA_EPHEMERAL}}."
    # Check if persistent location is effectively empty (allowing only lost+found)
    persistent_is_empty=0
    if [ ! "$(ls -A "${{PERSISTENT_DOCKER_ROOT}}" | grep -v '^lost+found$' 2>/dev/null)" ]; then
        persistent_is_empty=1
    fi

    if [ $persistent_is_empty -eq 1 ]; then
        log "Persistent location ${{PERSISTENT_DOCKER_ROOT}} is empty. Migrating data..."

        # Ensure Docker is stopped before migration
        if systemctl is-active --quiet docker; then
            log "Stopping Docker service for migration..."
            systemctl stop docker || log "WARNING: Failed to stop Docker. Migration proceeding, but data might be inconsistent!"
            sleep 3 # Give it time to release files
        fi

        log "Starting migration using cp -a -u..."
        MIGRATION_SUCCESS=0
        # Use cp -a -u: archive mode (preserve attrs), update mode (copy only if newer/missing).
        # Source ends with /. to copy contents including hidden files.
        # This is the recommended busybox alternative to rsync for local mirroring.
        if cp -a -u "${{DOCKER_DATA_EPHEMERAL}}/." "${{PERSISTENT_DOCKER_ROOT}}/"; then
             MIGRATION_SUCCESS=1
        else
             log "ERROR: cp -a -u migration failed with exit code $? !"
        fi

        # Handle migration outcome
        if [ $MIGRATION_SUCCESS -eq 1 ]; then
            log "Migration completed successfully."
            # Rename old data directory as backup
            OLD_DATA_BACKUP="${{DOCKER_DATA_EPHEMERAL}}.migrated.$(date +%Y%m%d_%H%M%S).bak"
            log "Attempting to rename old data directory to ${{OLD_DATA_BACKUP}}..."
            # Use mv -T to handle if ephemeral is somehow a symlink
            if mv -T "${{DOCKER_DATA_EPHEMERAL}}" "${{OLD_DATA_BACKUP}}"; then
                log "Successfully renamed old data directory."
            else
                log "WARNING: Could not rename old data directory ${{DOCKER_DATA_EPHEMERAL}}. It may still contain data."
                # Consider rm -rf here ONLY if migration verification was very thorough, otherwise leave it.
            fi
            # Mark that Docker needs restart due to migration
            CONFIG_CHANGED=1
        else
            log "ERROR: Data migration failed! Docker data may be incomplete or inconsistent in ${{PERSISTENT_DOCKER_ROOT}}."
            # Exiting is likely the safest option here to force manual review.
            exit 1
        fi
    else
        log "Persistent location ${{PERSISTENT_DOCKER_ROOT}} already contains data. Skipping migration."
        # Optionally rename the ephemeral data if it still exists and is unwanted
        OLD_DATA_BACKUP="${{DOCKER_DATA_EPHEMERAL}}.ignored.$(date +%Y%m%d_%H%M%S).bak"
        log "Attempting to rename unused ephemeral data directory to ${{OLD_DATA_BACKUP}}..."
        mv -T "${{DOCKER_DATA_EPHEMERAL}}" "${{OLD_DATA_BACKUP}}" || \\
            log "WARNING: Could not rename ephemeral data directory ${{DOCKER_DATA_EPHEMERAL}}."
    fi
else
    log "No significant data found in ephemeral location ${{DOCKER_DATA_EPHEMERAL}}. No migration needed."
fi
# Ensure the original ephemeral directory path exists but is empty, with a marker
log "Ensuring ephemeral path ${{DOCKER_DATA_EPHEMERAL}} exists and is marked as unused."
# Remove original path if it still exists (e.g., if rename failed but we continued)
if [ -d "${{DOCKER_DATA_EPHEMERAL}}" ]; then
    rm -rf "${{DOCKER_DATA_EPHEMERAL}}" || log "WARNING: Failed to remove original ephemeral directory after processing."
fi
mkdir -p "${{DOCKER_DATA_EPHEMERAL}}"
touch "${{DOCKER_DATA_EPHEMERAL}}/.sbnb_persistent_redirect"
echo "Docker data is managed at ${{PERSISTENT_DOCKER_ROOT}}. This directory should remain empty." > "${{DOCKER_DATA_EPHEMERAL}}/README_DO_NOT_USE.txt"
chmod 644 "${{DOCKER_DATA_EPHEMERAL}}/README_DO_NOT_USE.txt" # 644 doesn't need :o format
chmod 600 "${{DOCKER_DATA_EPHEMERAL}}/.sbnb_persistent_redirect" # 600 doesn't need :o format
log "Data migration check finished."


# 4. Restart Docker Service *if* configuration was changed OR migration occurred
if [ $CONFIG_CHANGED -eq 1 ]; then
    log "Configuration or data migration requires Docker restart. Reloading daemon and restarting service..."
    if ! systemctl daemon-reload; then
        log "ERROR: Failed to reload systemd daemon! Docker restart might fail or use old config."
        exit 1 # Critical failure if daemon cannot reload
    fi
    log "Attempting to restart docker.service..."
    if systemctl restart docker.service; then
        log "Docker service restarted successfully."
    else
        log "ERROR: Failed to restart Docker service! Check 'journalctl -u docker.service'."
        exit 1 # Critical failure if Docker doesn't restart after config change/migration
    fi
else
    log "No configuration changes or migration. Docker restart not required by this script."
    # Optional: Ensure Docker is running even if no changes occurred
    # log "Ensuring Docker service is active..."
    # if ! systemctl is-active --quiet docker.service; then
    #     log "Docker service is not active. Attempting to start..."
    #     systemctl start docker.service || log "WARNING: Failed to start inactive Docker service."
    # fi
fi
log "Docker setup finished."


# --- Update Optional Development Environment Script ---
# (Using the robust atomic update logic)
TARGET_DEV_ENV_SCRIPT="/usr/sbin/sbnb-dev-env.sh"
SOURCE_DEV_ENV_SCRIPT="${{DATA_MOUNT_POINT}}/scripts/sbnb-dev-env.sh" # Assuming it's stored persistently

log "Checking for optional development script update: ${{SOURCE_DEV_ENV_SCRIPT}}"
if [ -f "${{SOURCE_DEV_ENV_SCRIPT}}" ] && [ -r "${{SOURCE_DEV_ENV_SCRIPT}}" ]; then
    log "Source script found. Attempting atomic update of ${{TARGET_DEV_ENV_SCRIPT}}..."
    TARGET_DIR=$(dirname "${{TARGET_DEV_ENV_SCRIPT}}")
    TMP_SCRIPT=""

    # Setup trap for cleanup
    trap 'sbnb_dev_cleanup' EXIT HUP INT QUIT TERM
    sbnb_dev_cleanup() {{
        if [ -n "${{TMP_SCRIPT:-}}" ] && [ -f "${{TMP_SCRIPT}}" ]; then
            rm -f "${{TMP_SCRIPT}}"
            log "Cleaned up temporary file ${{TMP_SCRIPT}}"
        fi
        trap - EXIT HUP INT QUIT TERM # Reset trap
    }}

    if [ ! -d "${{TARGET_DIR}}" ] || [ ! -w "${{TARGET_DIR}}" ]; then
        log "WARNING: Target directory ${{TARGET_DIR}} does not exist or is not writable. Cannot update script."
    # Check required commands exist (already done by check_cmds, but good practice here too)
    elif ! command -v mktemp >/dev/null 2>&1 || ! command -v cp >/dev/null 2>&1 || ! command -v chmod >/dev/null 2>&1 || ! command -v mv >/dev/null 2>&1; then
        log "WARNING: Required command (mktemp/cp/chmod/mv) not found. Skipping update."
    else
        TMP_SCRIPT=$(mktemp "${{TARGET_DIR}}/sbnb-dev-env.sh.XXXXXX")
        if [ -z "${{TMP_SCRIPT}}" ] || [ ! -f "${{TMP_SCRIPT}}" ]; then
            log "WARNING: Failed to create temporary file in ${{TARGET_DIR}}. Skipping update."
            TMP_SCRIPT="" # Prevent trap from trying to remove nothing
        else
            # Proceed with copy, chmod, move
            if cp "${{SOURCE_DEV_ENV_SCRIPT}}" "${{TMP_SCRIPT}}"; then
                if chmod +x "${{TMP_SCRIPT}}"; then
                    # Use mv -T to handle target being a symlink correctly
                    if mv -T "${{TMP_SCRIPT}}" "${{TARGET_DEV_ENV_SCRIPT}}"; then
                        log "Successfully updated ${{TARGET_DEV_ENV_SCRIPT}}."
                        TMP_SCRIPT="" # Clear var so trap doesn't remove the final script
                    else log "WARNING: Failed to move temporary file ${{TMP_SCRIPT}} to ${{TARGET_DEV_ENV_SCRIPT}}. Update failed."; fi
                else log "WARNING: Failed to set execute permissions on temporary file ${{TMP_SCRIPT}}. Update failed."; fi
            else log "WARNING: Failed to copy content from ${{SOURCE_DEV_ENV_SCRIPT}} to ${{TMP_SCRIPT}}. Update failed."; fi
        fi
        # Clean up temp file if it still exists (e.g., on mv failure) and TMP_SCRIPT is set
        if [ -n "${{TMP_SCRIPT:-}}" ] && [ -f "${{TMP_SCRIPT}}" ]; then rm -f "${{TMP_SCRIPT}}"; fi
        TMP_SCRIPT="" # Ensure trap doesn't run again for this
    fi
    trap - EXIT HUP INT QUIT TERM # Clear trap explicitly
else
    log "NOTE: Source script ${{SOURCE_DEV_ENV_SCRIPT}} not found or not readable. Skipping update."
fi
log "Update of optional script finished."


# --- Enable Systemd Units (Backup/Purge + Health/Volume Checks) ---
SYSTEMD_SOURCE_DIR="${{DATA_MOUNT_POINT}}/systemd"
SYSTEMD_TARGET_DIR="/etc/systemd/system"
TIMERS_WANTS_DIR="${{SYSTEMD_TARGET_DIR}}/timers.target.wants"

log "Enabling custom systemd units (Source: ${{SYSTEMD_SOURCE_DIR}})..."
if [ -d "${{SYSTEMD_SOURCE_DIR}}" ] && [ -r "${{SYSTEMD_SOURCE_DIR}}" ]; then
    mkdir -p "${{SYSTEMD_TARGET_DIR}}"
    mkdir -p "${{TIMERS_WANTS_DIR}}"
    # Check ln and systemctl exist (already done in check_cmds)

    linked_any=0
    log "Linking systemd unit files..."
    # Use find with -print0 and read -d '' for safe filename handling
    find "${{SYSTEMD_SOURCE_DIR}}" -maxdepth 1 -type f \\( -name '*.service' -o -name '*.timer' \\) -print0 | while IFS= read -r -d '' source_unit; do
        unit_name=$(basename "${{source_unit}}")
        target_link="${{SYSTEMD_TARGET_DIR}}/${{unit_name}}"
        log "  Linking ${{unit_name}}..."
        # Use ln -sf: symbolic, force overwrite if link exists
        if ln -sf "${{source_unit}}" "${{target_link}}"; then
            linked_any=1
        else
            log "  WARNING: Failed to link ${{unit_name}}."
        fi
    done

    if [ $linked_any -eq 0 ]; then
        log "No unit files found in ${{SYSTEMD_SOURCE_DIR}} to link."
    else
        log "Reloading systemd daemon after linking units..."
        # Reload daemon again (might be redundant if Docker restart already did it, but safe)
        systemctl daemon-reload || log "WARNING: systemctl daemon-reload failed after linking units."

        log "Enabling systemd timers/services..."
        enabled_any=0
        # Define ALL units expected to be enabled by this script
        UNITS_TO_ENABLE="docker-backup.timer docker-purge.timer docker-shutdown-backup.service docker-health-check.timer docker-volume-check.timer"
        final_enabled_list=""
        # Use 'for unit in $UNITS_TO_ENABLE' which relies on word splitting
        # shellcheck disable=SC2086
        for unit in $UNITS_TO_ENABLE; do
            # Check if the link exists and points to a file before enabling
            if [ -L "${{SYSTEMD_TARGET_DIR}}/${{unit}}" ] && [ -f "${{SYSTEMD_TARGET_DIR}}/${{unit}}" ]; then
                log "  Enabling ${{unit}}..."
                # Use --now to also start timers immediately if desired, otherwise just enable
                if systemctl enable "${{unit}}"; then
                    enabled_any=1
                    final_enabled_list="${{final_enabled_list}} ${{unit}}"
                else
                    log "  WARNING: Failed to enable ${{unit}}."
                fi
            else
                log "  Skipping enable for ${{unit}} (link missing or broken)."
            fi
        done

        if [ $enabled_any -eq 1 ]; then
            final_enabled_list=$(echo "${{final_enabled_list}}" | sed 's/^ *//') # Remove leading space
            log "Systemd units enabled successfully: ${{final_enabled_list}}"
        else
            log "No relevant systemd units were successfully enabled."
        fi
    fi # end if linked_any
else
    log "WARNING: Systemd source directory ${{SYSTEMD_SOURCE_DIR}} not found or not readable. Cannot enable units."
fi
log "Systemd unit setup finished."

# --- Script Finish Logging ---
log "Finished custom boot commands successfully."

# Clear trap explicitly
trap - EXIT HUP INT QUIT TERM
exit 0
"""

# --- Tailscale Key ---
# !!! REPLACE THIS WITH YOUR ACTUAL KEY !!!
SBNB_TSKEY_TXT_CONTENT = "tskey-auth-..." # Placeholder

# --- Backup Script ---
BACKUP_DOCKER_SH_CONTENT = f"""#!/bin/sh
# File: {DATA_MOUNT}/scripts/backup-docker.sh
# Backs up the persistent Docker data-root directory.

set -e -u

# --- Configuration ---
DOCKER_DATA_DIR="{PERSISTENT_DOCKER_ROOT}" # Source is PERSISTENT root
BACKUP_DIR="{BACKUP_BASE_DIR}"
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
BACKUP_FILE="${{BACKUP_DIR}}/docker_backup_${{TIMESTAMP}}.tar.gz"
LATEST_LINK="${{BACKUP_DIR}}/docker_latest.tar.gz"
STOP_DOCKER={STOP_DOCKER_FOR_BACKUP} # 1=Stop Docker (safer), 0=Live backup

log() {{ echo "[backup-docker.sh] $1" > /dev/kmsg; }}

# --- Check Commands ---
log "Checking required commands..."
check_cmds() {{
    local missing_cmd=0
    for cmd in "$@"; do
        if ! command -v "$cmd" >/dev/null 2>&1; then log "ERROR: Command '$cmd' not found."; missing_cmd=1; fi
    done
    # Exit if any command is missing
    [ $missing_cmd -eq 1 ] && exit 1
}}
# Core commands needed
check_cmds date mkdir tar gzip ln mv sleep dirname basename
# Check systemctl only if stopping docker is enabled
[ $STOP_DOCKER -eq 1 ] && check_cmds systemctl

# Check for optional 'nice' command
NICE_CMD=""
if command -v nice >/dev/null 2>&1; then NICE_CMD="nice -n 19"; log "Using nice for lower tar priority."; fi

# --- Main Logic ---
log "Starting Docker backup process..."
log "Source:         ${{DOCKER_DATA_DIR}}"
log "Destination:    ${{BACKUP_FILE}}"

# Ensure backup directory exists and is writable
log "Ensuring backup directory exists: ${{BACKUP_DIR}}"
mkdir -p "${{BACKUP_DIR}}"
# Check write permissions specifically
if [ ! -w "${{BACKUP_DIR}}" ]; then log "ERROR: Backup directory not writable: ${{BACKUP_DIR}}"; exit 1; fi

# Stop Docker if configured
DOCKER_WAS_RUNNING=0
if [ $STOP_DOCKER -eq 1 ]; then
    log "Attempting to stop Docker service..."
    if systemctl is-active --quiet docker.service; then
        DOCKER_WAS_RUNNING=1
        log "Docker service is active, stopping..."
        if systemctl stop docker.service; then
            log "Docker service stopped. Waiting 5s for files to release..."; sleep 5
        else
            # If stop fails, warn but maybe proceed? Or exit? Exiting might be safer.
            log "ERROR: Failed to stop Docker service gracefully! Backup might be inconsistent or fail. Aborting."
            exit 1 # Exit if stop fails, as backup consistency is compromised
        fi
    else
        log "Docker service already stopped."
    fi
fi

# Create backup
log "Creating backup archive..."
if [ -d "${{DOCKER_DATA_DIR}}" ] && [ -r "${{DOCKER_DATA_DIR}}" ]; then
    PARENT_DIR=$(dirname "${{DOCKER_DATA_DIR}}")
    SOURCE_BASENAME=$(basename "${{DOCKER_DATA_DIR}}")
    log "Archiving '${{SOURCE_BASENAME}}' from parent '${{PARENT_DIR}}'..."
    # Use -C to change directory, archive relative path 'docker-root/...'
    # Add --warning=no-file-changed to suppress warnings about files changing during read
    # shellcheck disable=SC2086 # Allow word splitting for $NICE_CMD
    if ${{NICE_CMD}} tar --warning=no-file-changed -czf "${{BACKUP_FILE}}" -C "${{PARENT_DIR}}" "${{SOURCE_BASENAME}}"; then
        log "Backup archive created successfully."
        # Verify backup file exists and is not empty
        if [ -s "${{BACKUP_FILE}}" ]; then
            log "Updating latest backup link..."
            # Atomic symlink update: create temp link, then rename over old one
            ln -sfT "${{BACKUP_FILE}}" "${{LATEST_LINK}}.tmp" && mv -Tf "${{LATEST_LINK}}.tmp" "${{LATEST_LINK}}"
            if [ $? -eq 0 ]; then
                log "Updated latest link to point to ${{BACKUP_FILE}}."
            else
                log "WARNING: Failed to update latest backup link."
                rm -f "${{LATEST_LINK}}.tmp" # Clean up temp link if mv failed
            fi
        else
            log "WARNING: Backup file seems invalid (empty/missing): ${{BACKUP_FILE}}. Removing."
            rm -f "${{BACKUP_FILE}}"
        fi
    else
        tar_exit_code=$?
        log "ERROR: tar command failed with exit code ${{tar_exit_code}}! Backup failed."
        rm -f "${{BACKUP_FILE}}" # Clean up partial archive if tar failed
    fi
else
    log "WARNING: Docker data directory not found or not readable: ${{DOCKER_DATA_DIR}}. Skipping backup."
fi

# Restart Docker if it was running and we stopped it successfully
if [ $DOCKER_WAS_RUNNING -eq 1 ]; then
    log "Restarting Docker service..."
    if ! systemctl start docker.service; then
        log "WARNING: Failed to restart Docker service after backup."
    else
        log "Docker service restarted."
    fi
fi

log "Docker backup process finished."
exit 0
"""

# --- Purge Script ---
PURGE_DOCKER_BACKUPS_SH_CONTENT = f"""#!/bin/sh
# File: {DATA_MOUNT}/scripts/purge-docker-backups.sh
# Removes old Docker backups, keeping the last N.

set -e -u

BACKUP_DIR="{BACKUP_BASE_DIR}"
KEEP_COUNT={BACKUP_KEEP_COUNT}

log() {{ echo "[purge-docker-backups.sh] $1" > /dev/kmsg; }}

# Check commands
check_cmds() {{
    local missing_cmd=0
    for cmd in "$@"; do if ! command -v "$cmd" >/dev/null 2>&1; then log "ERROR: Command '$cmd' not found."; missing_cmd=1; fi; done
    [ $missing_cmd -eq 1 ] && exit 1
}}
check_cmds find wc sort head cut xargs rm mkdir date

log "Purging old Docker backups in ${{BACKUP_DIR}}, keeping ${{KEEP_COUNT}}..."

# Validate KEEP_COUNT
if ! [ "$KEEP_COUNT" -ge 0 ] 2>/dev/null; then log "ERROR: KEEP_COUNT (${{KEEP_COUNT}}) is invalid."; exit 1; fi

# Ensure backup directory exists and is accessible
if ! mkdir -p "${{BACKUP_DIR}}"; then log "ERROR: Failed to create backup directory ${{BACKUP_DIR}}!"; exit 1; fi
if [ ! -d "${{BACKUP_DIR}}" ] || [ ! -r "${{BACKUP_DIR}}" ] || [ ! -w "${{BACKUP_DIR}}" ]; then log "ERROR: Cannot access backup directory ${{BACKUP_DIR}}!"; exit 1; fi

# Count existing backups safely
log "Counting existing backup files..."
backup_count=$(find "${{BACKUP_DIR}}" -maxdepth 1 -name 'docker_backup_*.tar.gz' -type f -print 2>/dev/null | wc -l)
find_exit_code=$?

if [ $find_exit_code -ne 0 ]; then log "WARNING: find command failed (${{find_exit_code}}) while counting backups. Skipping purge."; exit 0; fi
log "Found ${{backup_count}} backup files."

if [ "$backup_count" -gt "$KEEP_COUNT" ]; then
    to_delete_count=$(( backup_count - KEEP_COUNT ))
    log "Need to delete ${{to_delete_count}} oldest backup(s)."

    # Use find -printf with null terminators for safe filename handling
    log "Identifying oldest backups to delete..."
    delete_output=$(find "${{BACKUP_DIR}}" -maxdepth 1 -name 'docker_backup_*.tar.gz' -type f -printf '%T@ %p\\0' 2>/dev/null | \\
        sort -zn | \\
        head -zn "${{to_delete_count}}" | \\
        cut -z -d' ' -f2- | \\
        xargs -0 -r rm -v -- 2>&1) # Capture rm output (stdout+stderr)
    rm_exit_code=$?

    if [ $rm_exit_code -eq 0 ]; then
        log "Purge completed successfully."
        if [ -n "$delete_output" ]; then
            log "Deleted files:"
            # Log multi-line output safely
            echo "$delete_output" | while IFS= read -r line || [ -n "$line" ]; do log "  $line"; done
        fi
    else
        log "WARNING: Purge command (rm) failed (exit code ${{rm_exit_code}}). Check output below."
        log "rm output:"
        echo "$delete_output" | while IFS= read -r line || [ -n "$line" ]; do log "  $line"; done
    fi
else
    log "${{backup_count}} backups found <= ${{KEEP_COUNT}}. No backups purged."
fi

log "Backup purge process finished."
exit 0
"""

# --- Health Check Script ---
DOCKER_HEALTH_CHECK_SH_CONTENT = f"""#!/bin/sh
# File: {DATA_MOUNT}/scripts/docker-health-check.sh
# Checks Docker daemon health, responsiveness, and data-root configuration.

set -e -u

PERSISTENT_ROOT="{PERSISTENT_DOCKER_ROOT}"
DOCKER_CONFIG_FILE="{DOCKER_CONFIG_FILE}"

log() {{ echo "[docker-health-check] $1" | tee /dev/kmsg; }} # Log to kmsg and stdout/stderr

log "Starting Docker health check..."

# Check required commands
check_cmds() {{ for cmd in "$@"; do if ! command -v "$cmd" >/dev/null 2>&1; then log "ERROR: Command '$cmd' not found."; exit 1; fi; done }}
check_cmds systemctl docker

# Check if Docker daemon service is running
log "Checking if docker.service is active..."
if ! systemctl is-active --quiet docker.service; then
    log "WARNING: Docker service is not running. Attempting restart..."
    if systemctl restart docker.service; then
        log "Docker service restarted successfully."
        sleep 5 # Give it time to fully start
    else
        log "ERROR: Failed to restart inactive Docker service!"
        exit 1 # Critical failure if it should be running but can't be started
    fi
fi

# Verify Docker daemon is responding to commands
log "Checking Docker daemon responsiveness via 'docker info'..."
if ! docker info > /dev/null 2>&1; then
    log "WARNING: Docker service is running but 'docker info' command failed. Attempting restart..."
    if systemctl restart docker.service; then
        log "Docker service restarted successfully."
        sleep 5 # Give it time
        # Re-check responsiveness after restart
        if ! docker info > /dev/null 2>&1; then
            log "ERROR: Docker daemon still not responding after restart! Requires manual investigation."
            exit 1 # Critical failure
        else
            log "Docker daemon is now responsive after restart."
        fi
    else
        log "ERROR: Failed to restart unresponsive Docker service!"
        exit 1 # Critical failure
    fi
else
    log "Docker daemon is responsive."
fi

# Check if Docker is using the correct data-root directory
log "Checking configured Docker data-root directory..."
# Use docker info with Go template for precise extraction
CURRENT_ROOT=$(docker info --format '{{{{.DockerRootDir}}}}' 2>/dev/null || echo "ERROR_GETTING_INFO")

if [ "$CURRENT_ROOT" = "ERROR_GETTING_INFO" ]; then
    log "ERROR: Could not determine Docker's current data-root using 'docker info'. Health check incomplete."
    exit 1 # Exit as this is a significant issue
elif [ "$CURRENT_ROOT" != "$PERSISTENT_ROOT" ]; then
    log "CRITICAL ERROR: Docker is using incorrect data-root!"
    log "  Expected: $PERSISTENT_ROOT"
    log "  Actual:   $CURRENT_ROOT"
    log "This indicates a configuration problem in $DOCKER_CONFIG_FILE or Docker failed to apply it. Manual intervention required."
    exit 1 # Critical configuration error
else
    log "Docker is correctly using the persistent data-root: $PERSISTENT_ROOT"
fi

log "Docker health check completed successfully."
exit 0
"""

# --- Volume Check Script ---
# Define prune command based on configuration
if VOLUME_CHECK_PRUNE_LEVEL == 0:
    PRUNE_COMMAND = "echo 'Automatic pruning disabled.'" # No-op
elif VOLUME_CHECK_PRUNE_LEVEL == 1:
    # Prune stopped containers and dangling images only
    PRUNE_COMMAND = "docker container prune -f && docker image prune -f"
elif VOLUME_CHECK_PRUNE_LEVEL >= 2:
    # Prune stopped containers and *all* unused images (more aggressive)
    PRUNE_COMMAND = "docker container prune -f && docker image prune -a -f"
else: # Default to level 1 if invalid config
    PRUNE_COMMAND = "docker container prune -f && docker image prune -f"

DOCKER_VOLUME_CHECK_SH_CONTENT = f"""#!/bin/sh
# File: {DATA_MOUNT}/scripts/docker-volume-check.sh
# Checks free space on the Docker persistent volume and optionally prunes resources.

set -e -u

DOCKER_ROOT="{PERSISTENT_DOCKER_ROOT}"
MIN_FREE_PERCENT={VOLUME_CHECK_THRESHOLD_PERCENT}
# Prune command determined by Python script configuration (Level: {VOLUME_CHECK_PRUNE_LEVEL})
PRUNE_CMD="{PRUNE_COMMAND}"

log() {{ echo "[docker-volume-check] $1" | tee /dev/kmsg; }}

log "Checking Docker volume free space: ${{DOCKER_ROOT}}"

# Check required commands
check_cmds() {{ for cmd in "$@"; do if ! command -v "$cmd" >/dev/null 2>&1; then log "ERROR: Command '$cmd' not found."; exit 1; fi; done }}
check_cmds df awk sed docker # Need docker if pruning is enabled

# Check if the Docker root directory exists
if [ ! -d "$DOCKER_ROOT" ]; then log "ERROR: Docker root directory not found: $DOCKER_ROOT"; exit 1; fi

# Get free space percentage using df -P for POSIX compatibility
log "Calculating free space..."
# Get Available and Total blocks (in 1K blocks usually)
df_output=$(df -P "$DOCKER_ROOT" | awk 'NR==2 {{print $4, $2}}' 2>/dev/null)
if [ -z "$df_output" ]; then log "ERROR: Failed to get disk usage using df for $DOCKER_ROOT"; exit 1; fi

avail_kb=$(echo "$df_output" | awk '{{print $1}}')
total_kb=$(echo "$df_output" | awk '{{print $2}}')

# Handle edge case where total size is 0 or df failed weirdly
if [ -z "$total_kb" ] || [ "$total_kb" -le 0 ]; then
    log "WARNING: Total disk size reported as zero or invalid for $DOCKER_ROOT. Cannot calculate percentage."
    exit 0
fi

# Calculate free percentage using integer arithmetic
free_percent=$(( (avail_kb * 100) / total_kb ))

# Get human-readable sizes for logging
total_size_hr=$(df -h "$DOCKER_ROOT" | awk 'NR==2 {{print $2}}')
avail_size_hr=$(df -h "$DOCKER_ROOT" | awk 'NR==2 {{print $4}}')

log "Volume Stats: Total=${{total_size_hr}}, Available=${{avail_size_hr}}, Free=${{free_percent}}%"

# Check against threshold
if [ "$free_percent" -lt "$MIN_FREE_PERCENT" ]; then
    log "WARNING: Low disk space! Free: ${{free_percent}}% (Threshold: ${{MIN_FREE_PERCENT}}%)"

    # Attempt to prune based on configured level
    if [ {VOLUME_CHECK_PRUNE_LEVEL} -gt 0 ]; then
        log "Attempting automatic prune (Level: {VOLUME_CHECK_PRUNE_LEVEL})..."
        prune_output=$({PRUNE_COMMAND} 2>&1) || prune_exit_code=$?
        # Check exit code, prune can return non-zero even if it works partially
        if [ "${{prune_exit_code:-0}}" -eq 0 ]; then
            log "Docker prune command executed successfully."
        else
            log "WARNING: Docker prune command finished with exit code ${{prune_exit_code}}."
        fi
        log "Prune output:"
        echo "$prune_output" | while IFS= read -r line || [ -n "$line" ]; do log "  $line"; done

        # Recalculate free space after pruning
        log "Recalculating space after cleanup..."
        df_output=$(df -P "$DOCKER_ROOT" | awk 'NR==2 {{print $4, $2}}' 2>/dev/null)
        avail_kb=$(echo "$df_output" | awk '{{print $1}}')
        total_kb=$(echo "$df_output" | awk '{{print $2}}')
        if [ "$total_kb" -gt 0 ]; then free_percent=$(( (avail_kb * 100) / total_kb )); else free_percent=0; fi
        avail_size_hr=$(df -h "$DOCKER_ROOT" | awk 'NR==2 {{print $4}}')

        log "Space after cleanup: Available=${{avail_size_hr}}, Free=${{free_percent}}%"
        if [ "$free_percent" -lt "$MIN_FREE_PERCENT" ]; then
            log "ERROR: Space still critically low after cleanup! Manual intervention likely required."
        else
            log "Space is now above threshold after cleanup."
        fi
    else
         log "Automatic pruning is disabled (Level 0). Manual cleanup needed."
    fi
else
    log "Sufficient free space available (${{free_percent}}%)."
fi

log "Docker volume check completed."
exit 0
"""

# --- Systemd Units (Content definitions remain the same as previous version) ---
# Backup Service
DOCKER_BACKUP_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-backup.service
[Unit]
Description=Backup Docker Data ({PERSISTENT_DOCKER_ROOT})
Documentation=file://{DATA_MOUNT}/scripts/backup-docker.sh
Requires=mnt-sbnb-data.mount
After=mnt-sbnb-data.mount docker.service # Ensure mount and docker are up

[Service]
Type=oneshot
ExecStart=/bin/sh {DATA_MOUNT}/scripts/backup-docker.sh
"""
# Backup Timer
DOCKER_BACKUP_TIMER_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-backup.timer
[Unit]
Description=Daily Docker Backup Timer ({PERSISTENT_DOCKER_ROOT})
Requires=docker-backup.service

[Timer]
OnCalendar=*-*-* 05:00:00
AccuracySec=1h
Persistent=true
RandomizedDelaySec=600 # 10 minutes
Unit=docker-backup.service

[Install]
WantedBy=timers.target
"""
# Purge Service
DOCKER_PURGE_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-purge.service
[Unit]
Description=Purge Old Docker Backups ({BACKUP_BASE_DIR})
Documentation=file://{DATA_MOUNT}/scripts/purge-docker-backups.sh
Requires=mnt-sbnb-data.mount
After=mnt-sbnb-data.mount

[Service]
Type=oneshot
ExecStart=/bin/sh {DATA_MOUNT}/scripts/purge-docker-backups.sh
"""
# Purge Timer
DOCKER_PURGE_TIMER_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-purge.timer
[Unit]
Description=Daily Docker Backup Purge Timer
Requires=docker-purge.service

[Timer]
OnCalendar=*-*-* 06:00:00
AccuracySec=1h
Persistent=true
RandomizedDelaySec=300 # 5 minutes
Unit=docker-purge.service

[Install]
WantedBy=timers.target
"""
# Shutdown Backup Service
DOCKER_SHUTDOWN_BACKUP_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-shutdown-backup.service
[Unit]
Description=Backup Docker Data ({PERSISTENT_DOCKER_ROOT}) on Shutdown (Best Effort)
Documentation=file://{DATA_MOUNT}/scripts/backup-docker.sh
DefaultDependencies=no # Crucial for shutdown units
Requires=mnt-sbnb-data.mount docker.service
After=mnt-sbnb-data.mount docker.service network.target
Before=shutdown.target reboot.target halt.target kexec.target umount.target final.target

[Service]
Type=oneshot
RemainAfterExit=true # Important for ExecStop= during shutdown
TimeoutStopSec=180 # Give backup reasonable time (3 minutes)
ExecStop=/bin/sh {DATA_MOUNT}/scripts/backup-docker.sh # Run backup on stop

[Install]
WantedBy=shutdown.target reboot.target halt.target kexec.target
"""
# Health Check Service
DOCKER_HEALTH_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-health-check.service
[Unit]
Description=Docker Health Check Service
Documentation=file://{DATA_MOUNT}/scripts/docker-health-check.sh
Requires=mnt-sbnb-data.mount docker.service
After=mnt-sbnb-data.mount docker.service

[Service]
Type=oneshot
ExecStart=/bin/sh {DATA_MOUNT}/scripts/docker-health-check.sh
# Optional resource limits
# CPUQuota=10%
# MemoryMax=128M
"""
# Health Check Timer
DOCKER_HEALTH_TIMER_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-health-check.timer
[Unit]
Description=Regular Docker Health Check Timer
Requires=docker-health-check.service

[Timer]
# Run 5 mins after boot, then every 15 mins
OnBootSec=5min
OnUnitActiveSec=15min
AccuracySec=1min
Unit=docker-health-check.service

[Install]
WantedBy=timers.target
"""
# Volume Check Service
DOCKER_VOLUME_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-volume-check.service
[Unit]
Description=Docker Volume Space Check Service ({PERSISTENT_DOCKER_ROOT})
Documentation=file://{DATA_MOUNT}/scripts/docker-volume-check.sh
Requires=mnt-sbnb-data.mount docker.service
After=mnt-sbnb-data.mount docker.service

[Service]
Type=oneshot
ExecStart=/bin/sh {DATA_MOUNT}/scripts/docker-volume-check.sh
# Optional resource limits
# CPUQuota=10%
# MemoryMax=64M
"""
# Volume Check Timer
DOCKER_VOLUME_TIMER_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-volume-check.timer
[Unit]
Description=Regular Docker Volume Check Timer
Requires=docker-volume-check.service

[Timer]
# Run 10 mins after boot, then every hour
OnBootSec=10min
OnUnitActiveSec=1h
AccuracySec=5min
Unit=docker-volume-check.service

[Install]
WantedBy=timers.target
"""


# --- Dictionary of Files to Create ---
# Defines all files to be generated by this script
FILES_TO_CREATE = {
    # --- ESP Files ---
    f"{ESP_MOUNT}/sbnb-cmds.sh": {
        "content": SBNB_CMDS_SH_CONTENT,
        "permissions": 0o755 # rwxr-xr-x
    },
    f"{ESP_MOUNT}/sbnb-tskey.txt": {
        "content": SBNB_TSKEY_TXT_CONTENT,
        "permissions": 0o600 # rw------- (Restrict access to key)
    },

    # --- Data Partition Files ---
    # Helper Scripts
    f"{DATA_MOUNT}/scripts/backup-docker.sh": {
        "content": BACKUP_DOCKER_SH_CONTENT,
        "permissions": 0o750 # rwxr-x--- (Owner exec, group read/exec)
    },
    f"{DATA_MOUNT}/scripts/purge-docker-backups.sh": {
        "content": PURGE_DOCKER_BACKUPS_SH_CONTENT,
        "permissions": 0o750
    },
    f"{DATA_MOUNT}/scripts/docker-health-check.sh": {
        "content": DOCKER_HEALTH_CHECK_SH_CONTENT,
        "permissions": 0o750
    },
    f"{DATA_MOUNT}/scripts/docker-volume-check.sh": {
        "content": DOCKER_VOLUME_CHECK_SH_CONTENT,
        "permissions": 0o750
    },
    # Systemd Units
    f"{DATA_MOUNT}/systemd/docker-backup.service": {
        "content": DOCKER_BACKUP_SERVICE_CONTENT,
        "permissions": 0o644 # rw-r--r-- (Standard systemd unit permissions)
    },
    f"{DATA_MOUNT}/systemd/docker-backup.timer": {
        "content": DOCKER_BACKUP_TIMER_CONTENT,
        "permissions": 0o644
    },
    f"{DATA_MOUNT}/systemd/docker-purge.service": {
        "content": DOCKER_PURGE_SERVICE_CONTENT,
        "permissions": 0o644
    },
    f"{DATA_MOUNT}/systemd/docker-purge.timer": {
        "content": DOCKER_PURGE_TIMER_CONTENT,
        "permissions": 0o644
    },
    f"{DATA_MOUNT}/systemd/docker-shutdown-backup.service": {
        "content": DOCKER_SHUTDOWN_BACKUP_SERVICE_CONTENT,
        "permissions": 0o644
    },
    f"{DATA_MOUNT}/systemd/docker-health-check.service": {
        "content": DOCKER_HEALTH_SERVICE_CONTENT,
        "permissions": 0o644
    },
    f"{DATA_MOUNT}/systemd/docker-health-check.timer": {
        "content": DOCKER_HEALTH_TIMER_CONTENT,
        "permissions": 0o644
    },
    f"{DATA_MOUNT}/systemd/docker-volume-check.service": {
        "content": DOCKER_VOLUME_SERVICE_CONTENT,
        "permissions": 0o644
    },
    f"{DATA_MOUNT}/systemd/docker-volume-check.timer": {
        "content": DOCKER_VOLUME_TIMER_CONTENT,
        "permissions": 0o644
    },
}

# --- Global counters for create_files status ---
warning_count = 0
fail_count = 0

# --- Main Script Logic ---

def check_prerequisites():
    """Verify script prerequisites before attempting file creation."""
    print("--- Checking Prerequisites ---")
    passed = True
    # 1. Check root privileges
    if os.geteuid() != 0:
        print("ERROR: Script must be run as root (UID 0).")
        passed = False
    else:
        print("OK: Running as root.")

    # 2. Check base mount points exist and are writable
    base_dirs = {ESP_MOUNT: "ESP", DATA_MOUNT: "Data"}
    for bdir, name in base_dirs.items():
        bdir_path = pathlib.Path(bdir)
        print(f"Checking {name} mount point: {bdir}...")
        if not bdir_path.is_dir():
            print(f"ERROR: Base {name} directory '{bdir}' does not exist or is not a directory.")
            print(f"       Please ensure the corresponding partition is mounted correctly before running.")
            passed = False
        elif not os.access(bdir_path, os.W_OK):
            print(f"ERROR: Base {name} directory '{bdir}' is not writable by the current user (root). Check mount options or permissions.")
            passed = False
        else:
            print(f"OK: Base {name} directory '{bdir}' exists and is writable.")

    # 3. Check for optional but recommended commands needed by generated scripts
    print("Checking for optional command (jq)...")
    try:
        if shutil.which("jq"):
            print("OK: 'jq' command found (recommended for robust daemon.json handling).")
        else:
            print("WARNING: 'jq' command not found. Generated sbnb-cmds.sh will use less robust methods for daemon.json, which might fail or overwrite existing settings.")
        # Removed rsync check as it's no longer used/preferred by the generated script
    except ImportError:
        print("WARNING: Python 'shutil' module not found, cannot check for optional command (jq).")
    except Exception as e:
        print(f"WARNING: Error checking for optional commands: {e}")


    if not passed:
        print("----------------------------")
        print("ERROR: Prerequisites not met. Aborting script.")
        sys.exit(1)
    print("--- Prerequisites OK ---")
    return True

def create_files():
    """Creates directories and files as defined in FILES_TO_CREATE."""
    global warning_count, fail_count # Declare intent to modify globals
    print("\n--- Starting File Creation Process ---")
    success_count = 0
    warning_count = 0 # Reset global counter
    fail_count = 0    # Reset global counter

    # Ensure the base backup directory exists first with correct permissions
    try:
        print(f"\nEnsuring base backup directory exists: {BACKUP_BASE_DIR}")
        # Create directory with specific permissions (rwxr-x---)
        os.makedirs(BACKUP_BASE_DIR, mode=BACKUP_DIR_PERMISSIONS, exist_ok=True)
        # Explicitly set permissions in case it already existed with different ones
        current_perm = stat.S_IMODE(os.stat(BACKUP_BASE_DIR).st_mode)
        if current_perm != BACKUP_DIR_PERMISSIONS:
            print(f"  Adjusting permissions on {BACKUP_BASE_DIR} to {BACKUP_DIR_PERMISSIONS:o}...") # Use :o format
            os.chmod(BACKUP_BASE_DIR, BACKUP_DIR_PERMISSIONS)
        print(f"OK: Backup directory ensured: {BACKUP_BASE_DIR} with permissions {BACKUP_DIR_PERMISSIONS:o}") # Use :o format
    except OSError as e:
        print(f"ERROR: Failed to create or set permissions on {BACKUP_BASE_DIR}: {e}")
        sys.exit(f"ERROR: Could not ensure backup directory '{BACKUP_BASE_DIR}'. Exiting.")
    except Exception as e:
        print(f"ERROR: An unexpected error occurred ensuring backup directory: {e}")
        sys.exit(f"ERROR: Could not ensure backup directory '{BACKUP_BASE_DIR}'. Exiting.")


    # Process the files dictionary
    for file_path_str, details in FILES_TO_CREATE.items():
        file_path = pathlib.Path(file_path_str)
        write_succeeded = False # Flag to track if write was successful

        try:
            content = details.get("content") # Use get() as content might be None for dirs
            permissions = details.get("permissions") # Use .get() for optional permissions
            # Assign default permissions if not specified
            if permissions is None:
                if content is None: # It's meant to be a directory
                    permissions = 0o755 # Default rwxr-xr-x for directories
                else: # It's a file
                    permissions = 0o644 # Default rw-r--r-- for files
                print(f"INFO: No specific permission set for {file_path}, using default {permissions:o}.") # Use :o format

        except KeyError as e:
            print(f"\nERROR: Configuration error - Missing '{e}' key for entry {file_path_str}. Skipping.")
            fail_count += 1
            continue
        except Exception as e:
            print(f"\nERROR: Configuration error for {file_path_str}: {e}. Skipping.")
            fail_count += 1
            continue

        print(f"\nProcessing: {file_path}")

        # 1. Create parent directories robustly
        try:
            parent_dir = file_path.parent
            # Check if parent needs creation (avoid os.makedirs on existing dirs if possible)
            if not parent_dir.is_dir():
                print(f"  Creating parent directory: {parent_dir}")
                # mode=0o755 sets default permissions for newly created dirs (rwxr-xr-x)
                os.makedirs(parent_dir, mode=0o755, exist_ok=True)
                # Explicitly set permissions on parent in case it was just created or exist_ok=True skipped it
                print(f"  Setting parent directory permissions to 755...") # 755 doesn't need 0o prefix
                os.chmod(parent_dir, 0o755)
            else:
                # Parent exists, ensure it's writable and has correct permissions
                print(f"  Parent directory exists: {parent_dir}")
                if not os.access(parent_dir, os.W_OK):
                    print(f"  WARNING: Parent directory {parent_dir} is not writable! File write may fail.")
                    warning_count += 1
                # Ensure existing parent has standard 755 permissions
                try:
                    current_parent_perm = stat.S_IMODE(os.stat(parent_dir).st_mode)
                    if current_parent_perm != 0o755:
                        print(f"  Ensuring parent directory permissions are 755 (currently {current_parent_perm:o})...") # Use :o format
                        os.chmod(parent_dir, 0o755)
                except OSError as e:
                    print(f"  WARNING: Could not check/set permissions on existing parent {parent_dir}: {e}")
                    warning_count += 1

        except OSError as e:
            print(f"  ERROR: Failed to create or set permissions on parent directory {parent_dir}: {e}")
            print(f"  Skipping item: {file_path}")
            fail_count += 1
            continue # Skip to the next file
        except Exception as e:
            print(f"  ERROR: An unexpected error occurred creating parent directory for {file_path}: {e}")
            print(f"  Skipping item: {file_path}")
            fail_count += 1
            continue

        # 2. Write the file content (or create directory if content is None)
        if content is not None: # It's a file
            try:
                print(f"  Writing content...")
                # Use write_text for atomic write where possible and UTF-8 encoding
                file_path.write_text(content, encoding='utf-8')
                print(f"  Successfully wrote: {file_path}")
                write_succeeded = True
            except IOError as e:
                print(f"  ERROR: Failed to write file {file_path}: {e}")
                fail_count += 1
                continue # Skip permissions if write failed
            except Exception as e:
                print(f"  ERROR: An unexpected error occurred writing {file_path}: {e}")
                fail_count += 1
                continue
        else: # It's a directory (content is None)
            try:
                print(f"  Ensuring directory exists: {file_path}")
                os.makedirs(file_path, mode=permissions, exist_ok=True)
                # Explicitly set permissions in case it already existed
                os.chmod(file_path, permissions)
                print(f"  Successfully ensured directory: {file_path}")
                write_succeeded = True # Treat dir success like file write success
            except OSError as e:
                print(f"  ERROR: Failed to create/set permissions on directory {file_path}: {e}")
                fail_count += 1
                continue
            except Exception as e:
                print(f"  ERROR: An unexpected error occurred ensuring directory {file_path}: {e}")
                fail_count += 1
                continue


        # 3. Set permissions (only if write/dir creation succeeded)
        if write_succeeded:
            try:
                # Check if current permissions match target permissions before attempting chmod
                current_perm = stat.S_IMODE(os.stat(file_path).st_mode)
                if current_perm != permissions:
                    print(f"  Setting permissions to {permissions:o} (currently {current_perm:o})...") # Use :o format
                    os.chmod(file_path, permissions)
                    print(f"  Successfully set permissions for: {file_path}")
                else:
                    print(f"  Permissions already set correctly ({permissions:o}) for: {file_path}") # Use :o format
                success_count += 1 # Count full success (write/dir + chmod)
            except OSError as e:
                print(f"  WARNING: Failed to set permissions on {file_path}: {e}")
                warning_count += 1 # Item created/written, but permissions failed/check failed
            except Exception as e:
                print(f"  WARNING: An unexpected error occurred setting permissions for {file_path}: {e}")
                warning_count += 1

    # --- Summary ---
    print("\n--- File Creation Summary ---")
    print(f"Successfully processed (created/permissioned): {success_count} items")
    print(f"Items processed but with warnings:             {warning_count}")
    print(f"Failed operations (write/dir/parent):          {fail_count}")
    print("-------------------------------\n")

    total_issues = fail_count + warning_count
    if total_issues > 0:
        print("NOTE: Some errors or warnings occurred during file creation.")
        if fail_count > 0:
            print("ERROR: Fatal errors occurred. Deployment incomplete.")
            return False # Fatal errors occurred
        else:
            print("Deployment completed, but with warnings. Please review the output above.")
            return True # Only non-fatal warnings
    else:
        print("SBNB configuration file deployment completed successfully.")
        return True


# --- Script Execution ---
if __name__ == "__main__":
    print("=====================================================================")
    print(" SBNB Unified Configuration Deployment Script (v2.1 - BusyBox cp) ")
    print("=====================================================================")
    print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"Configuring Docker persistent root: {PERSISTENT_DOCKER_ROOT}")
    print("Includes Backup/Purge and Health/Volume monitoring.")
    print("Data migration uses 'cp -a -u' (BusyBox friendly).")
    print("=====================================================================\n")

    # Store counts for final status reporting
    final_warning_count = 0
    final_fail_count = 0

    if check_prerequisites():
        # Capture status from create_files
        create_files_success = create_files()
        # Access the global counters updated by create_files
        final_warning_count = warning_count
        final_fail_count = fail_count

        if create_files_success or (final_fail_count == 0 and final_warning_count > 0) :
            # Success or only warnings - print final instructions
            print("\n!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
            print("!!! CRITICAL: You MUST replace the placeholder in                         !!!")
            print(f"!!!           '{ESP_MOUNT}/sbnb-tskey.txt' with your actual Tailscale auth key! !!!")
            print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
            print("\n--- Next Steps ---")
            print("1. Review any WARNINGS in the output above.")
            print("2. Reboot the system for sbnb-cmds.sh to take effect.")
            print("3. After reboot, verify Docker configuration and status:")
            print(f"   - Check data root: `docker info | grep 'Docker Root Dir'` (should show '{PERSISTENT_DOCKER_ROOT}')")
            print(f"   - Check status: `systemctl status docker.service`")
            print(f"   - Check boot script logs: `journalctl -t sbnb-cmds.sh --no-pager` or check `/dev/kmsg` output during boot")
            print(f"   - Check timers: `systemctl list-timers --all | grep docker`")
            print(f"   - Check helper script logs periodically: `journalctl -t backup-docker.sh -t purge-docker-backups.sh -t docker-health-check -t docker-volume-check --no-pager`")

            if final_warning_count > 0:
                print("\nDeployment finished with WARNINGS.")
                sys.exit(2) # Exit code 2 for success with warnings
            else:
                print("\nDeployment finished successfully.")
                sys.exit(0) # Exit successfully
        else:
            # Fatal errors occurred during file creation
            print("\n--- Deployment Failed ---")
            print("Fatal errors occurred during file creation. System configuration may be incomplete or inconsistent.")
            sys.exit(1) # Exit with error code

Unmount the EFI Partition:
echo "--- Unmounting ESP partition ---" # Ensure buffers are flushed before unmounting sync sudo umount /mnt/sbnb-mount

#Phase 4: Backing Up Data (CRITICAL!)

Why Essential: High risk of USB drive failure. Backups are mandatory.
Strategy: Automate regular backups of /mnt/sbnb-data.
File Data Backup (rsync): Ensure the backup destination (NAS, cloud, another server) has sufficient free space.
# Example: From Sbnb to backup-server (requires ssh key auth) rsync -avz --delete --progress --human-readable /mnt/sbnb-data/ user@backup-server:/path/to/backups/sbnb-usb-data/
Frequency: Daily recommended for active data.
Automation: Use cron/systemd timers or remote triggers.
Testing Restores: Vital! Don’t assume backups work.
Conceptual Restore: Boot Linux Live env -> Mount backup source -> Mount target USB data partition (new/reformatted) to /mnt/restore -> sudo rsync -av --progress /path/to/backup/sbnb-usb-data/ /mnt/restore/ -> Verify restored files (count, size, checksums, spot checks).
Verification: Use tools like diff -r, md5sum, or sha256sum to compare restored files against originals or known good copies.
Untested backups provide a false sense of security.

#Phase 5: Boot and Verify

Safely Eject: Eject USB from prep system.
Configure Server BIOS/UEFI: Enter setup (DEL, F2, F10, F12, etc.). Ensure UEFI Mode ON, CSM/Legacy OFF, Secure Boot OFF. Set “UEFI: USB…” as first boot device. Save & Exit.
Boot Sbnb Linux.
Verify Operation:
- Monitor Boot: Watch console for sbnb-cmds.sh logs, errors.
- SSH into Sbnb.
- Check Mounts:
  lsblk -o NAME,SIZE,TYPE,FSTYPE,LABEL,MOUNTPOINT # Look for mount at /mnt/sbnb-data df -hT | grep -E 'Filesystem|/mnt/sbnb-data' # Check usage/type mount | grep /mnt/sbnb-data # Check mount options (rw, noatime) findmnt /mnt/sbnb-data # Another way to check mount info
- Test Persistence: ```bash
  #After SSHing in:
  
  TIMESTAMP=$(date) echo “Sbnb USB Persistence test - $TIMESTAMP” | sudo tee /mnt/sbnb-data/persistence_test.txt > /dev/null sync && echo “Synced data to disk.” echo “File created. Content:” && sudo cat /mnt/sbnb-data/persistence_test.txt echo “Rebooting server now…” && sudo reboot
#— Wait for reboot and reconnect via SSH —

echo “Checking for file after reboot…” if [ -f /mnt/sbnb-data/persistence_test.txt ]; then echo “SUCCESS: File found. Content:” && sudo cat /mnt/sbnb-data/persistence_test.txt sudo rm /mnt/sbnb-data/persistence_test.txt # Clean up else echo “FAILURE: File NOT FOUND after reboot! Persistence failed.” fi ```

#Troubleshooting

Doesn’t Boot / No Bootable Device:
- Re-verify BIOS settings (UEFI, Secure Boot OFF, Boot Order).
- Re-verify USB Prep: Partitions (parted print), ESP flags (boot,esp), ESP filesystem label (blkid /dev/sdX1 -> LABEL="sbnb"), EFI file path (/EFI/BOOT/BOOTX64.EFI).
- Try different USB ports (check if port provides sufficient power). Test drive health on prep machine (fsck, badblocks -nvs /dev/sdX). Recreate drive meticulously.
Data Partition Not Mounted / /mnt/sbnb-data Empty:
- Check boot logs (journalctl -b, console) for sbnb-cmds.sh errors (“Device… not found”, “Failed to mount”). Check dmesg for USB errors (dmesg | grep -iE 'usb|sdX') or filesystem errors (dmesg | grep -i ext4).
- SSH in:
  - Verify partition & label: sudo blkid, ls -l /dev/disk/by-label/. Is SBNB_DATA present? Does it point to the correct device?
  - If label wrong/missing: Re-label from prep env (sudo e2label /dev/sdX2 SBNB_DATA).
  - If device/label exists, try manual mount: sudo mkdir -p /mnt/sbnb-data && sudo mount /dev/disk/by-label/SBNB_DATA /mnt/sbnb-data. Check dmesg for errors (e.g., mount: wrong fs type, bad option, bad superblock). If manual mount works, debug sbnb-cmds.sh (add set -x, check paths, loop duration, check script permissions ls -l /mnt/sbnb/sbnb-cmds.sh).
  - Run filesystem check (unmounted): sudo e2fsck -f /dev/disk/by-label/SBNB_DATA.
  - Check kernel modules: lsmod | grep ext4. Is the module loaded? Check dmesg for errors loading filesystem modules.
Poor Performance / Drive Failure:
- Performance: Inherent limitation.
- Lifespan/Failure: Monitor dmesg for I/O errors. Restore from verified backups upon failure. This setup will wear out consumer flash drives with persistent writes.

URL: https://ib.bsb.br/sbnb