Portable linux via Sbnb distro with persistence

Slug: sbnb

85989 characters 10159 words

This guide provides comprehensive, step-by-step instructions for configuring a single USB flash drive (or potentially an external USB hard drive) to perform two distinct functions simultaneously:

  1. Booting the Sbnb Linux Operating System: The drive will be prepared with a standard UEFI-compatible structure, specifically an EFI System Partition (ESP) containing the Sbnb EFI bootloader (sbnb.efi) and necessary configuration files. This allows the server’s firmware to locate and start the Sbnb boot process. The sbnb.efi file itself is typically a Unified Kernel Image (UKI), bundling the Linux kernel, initramfs, and kernel command line into a single executable file.
  2. Providing Simple Persistent Storage: Utilizing a separate partition on the same physical USB drive, formatted with a standard Linux filesystem (ext4 is used in this guide). This partition is intended to be automatically mounted at the /mnt/sbnb-data directory path within the running Sbnb Linux system via a custom boot script (sbnb-cmds.sh). This provides a space where data (like container volumes, application data, logs, user files) can persist across reboots of the otherwise ephemeral, RAM-based Sbnb OS.

Why ext4 instead of LVM: Initial analysis suggested LVM might be suitable, but further review of the default Sbnb Linux build configuration indicates the necessary lvm2 user-space tools are likely missing from the base runtime environment. Without these tools, managing LVM volumes during boot via standard scripts is infeasible unless you create a custom Sbnb build that includes the lvm2 package. This revised guide therefore uses a standard ext4 filesystem partition, relying only on basic tools expected to be present in Sbnb.

Contrasting with Standard Sbnb Workflow: It’s crucial to understand that this guide describes a highly non-standard setup. The intended Sbnb workflow prioritizes resilience, performance, and statelessness:

  • Boot the minimal Sbnb OS from simple USB/network.
  • Use automation (Ansible) or manual scripts (sbnb-configure-storage.sh) post-boot to configure LVM on internal server drives.
  • Run workloads utilizing this fast, reliable internal storage. This guide’s method compromises these benefits for single-drive convenience under specific constraints.

#***** EXTREME CAUTION: IRREVERSIBLE DATA DESTRUCTION IMMINENT! *****

This procedure involves low-level disk operations (partitioning, formatting) that will completely and PERMANENTLY ERASE ALL DATA currently residing on the USB drive you select. There is NO UNDO function. Data recovery after accidental formatting is often impossible.

The most critical risk is selecting the wrong target device. Mistakenly choosing your computer’s internal hard drive (e.g., /dev/sda, /dev/nvme0n1) instead of the intended USB drive (e.g., /dev/sdb, /dev/sdc) WILL RESULT IN CATASTROPHIC AND LIKELY IRRECOVERABLE LOSS OF YOUR OPERATING SYSTEM, APPLICATIONS, AND PERSONAL FILES.

You MUST verify the target device name multiple times using different commands (like lsblk, fdisk, parted) and cross-reference with expected drive sizes and models before executing any partitioning or formatting commands. Proceed with extreme vigilance, double-checking each step, entirely at your own sole risk!


#Primary Drawbacks & Warnings (Reiterated & Expanded):

  • Highly Non-Standard & Complex: Deviates significantly from Sbnb’s design. Setup is intricate, runtime behavior depends on precise script execution and timing. Future Sbnb updates might break this.
  • Severe Performance Penalty: USB storage is inherently slow (latency, throughput, IOPS) compared to internal NVMe/SATA drives. Disk I/O to /mnt/sbnb-data will be a major bottleneck.
  • Drastically Reduced Lifespan & Reliability: USB flash drives will wear out quickly under persistent write load due to limited write cycles, write amplification, and lack of TRIM support. Unsuitable for write-intensive workloads or high reliability needs. Expect eventual failure and data loss without robust backups.
  • Potential Instability & Boot Issues: Relies on correct partition detection, udev node creation, filesystem integrity, and sbnb-cmds.sh execution timing. Failures can leave persistent storage unavailable.

#When Might This Be Considered? (Limited Scenarios with Full Risk Acceptance)

  • Temporary Testing/Experimentation ONLY: Brief evaluations on hardware lacking internal drives.
  • Specific, Very Low-Intensity, Read-Mostly Use Cases: Infrequent writes, performance irrelevant (e.g., static config kiosk).
  • Absolute Hardware Constraints: Sealed systems where internal drives are impossible, and risks are fully accepted.

Even in these limited scenarios, regular, automated, and verified backups are non-negotiable.

#Prerequisites

  • A Suitable USB Flash Drive:
    • Capacity: Min ~1GB ESP + desired data size (32GB+ recommended).
    • Quality & Speed: Reputable brand, USB 3.0+ advised for marginal speed benefit. Endurance matters more than peak speed.
  • A Working Linux System (Preparation Environment):
    • Necessity: Required for partitioning/formatting the target USB safely. openSUSE Tumbleweed assumed.
    • Live Environment Benefit: Using a Live USB/CD (e.g., openSUSE Tumbleweed Live) is highly recommended as it provides a non-destructive environment.
  • Sbnb Linux Boot File (sbnb.efi):
    • Method 1 (Easier): Run official Sbnb install script on a temporary USB, then copy /EFI/BOOT/BOOTX64.EFI from its ESP.
    • Method 2 (Advanced): Build Sbnb from source, find sbnb.efi in output/images/.
  • Root/Sudo Privileges: Needed on the openSUSE prep system for disk commands.
  • Internet Connection: May be needed for zypper.

#Step-by-Step Instructions

(Reminder: TRIPLE-CHECK your target device name, e.g., /dev/sdX, before every destructive command!)

#Phase 1: Prepare the Linux Environment (openSUSE Tumbleweed)

  1. Boot into openSUSE: Start your preparation environment.
  2. Install Necessary Tools: Open a terminal. zypper refresh updates package lists. zypper install installs tools.
    sudo zypper refresh sudo zypper install -y parted lvm2 dosfstools e2fsprogs
  3. Identify Target USB Drive: CRITICAL SAFETY STEP! Unplug other USB storage.
    • Insert the target USB drive.
    • Use multiple commands. Compare SIZE and MODEL. Check dmesg | tail after plugging in for kernel messages like sd 2:0:0:0: [sdc] Attached SCSI removable disk.
      lsblk -d -o NAME,SIZE,MODEL,VENDOR,TYPE | grep 'disk' sudo fdisk -l | grep '^Disk /dev/' sudo parted -l | grep '^Disk /dev/' # Example: If consistently identified as /dev/sdc, use /dev/sdc below.
    • Visually confirm with YaST Partitioner (sudo yast2 partitioner) or GParted (sudo zypper install -y gparted && sudo gparted) if preferred. Look for the drive matching the expected size and vendor/model.
    • Assume /dev/sdX is your verified target drive. Replace it carefully!

#Phase 2: Partition the USB Drive

(Warning: The following parted commands are DESTRUCTIVE to /dev/sdX. Double-check the device name!)

This script automates the partitioning and formatting process. Save it as prepare_usb.sh, make it executable (chmod +x prepare_usb.sh), and run it with sudo ./prepare_usb.sh /dev/sdX (replacing /dev/sdX with your verified target device).

#!/bin/bash # --- Configuration --- # Exit immediately if a command exits with a non-zero status. # Treat unset variables as an error when substituting. # Pipelines return the exit status of the last command to exit non-zero. set -euo pipefail # --- Variables --- # EFI System Partition (ESP) Label (CRITICAL - must match bootloader config) ESP_LABEL="sbnb" # Data Partition Label (Recommended for identification) DATA_LABEL="SBNB_DATA" # ESP Size (Adjust if needed, ~1GB is usually sufficient) ESP_SIZE="1025MiB" # List of required commands for the script to function REQUIRED_CMDS=( "parted" "mkfs.vfat" "mkfs.ext4" "wipefs" "findmnt" "lsblk" "blkid" "fsck.vfat" "e2fsck" "sync" "id" "grep" "read" "sleep" "xargs" "umount" "partprobe" "realpath" ) # --- Functions --- # Function to check for required commands check_dependencies() { echo "--- Checking for required commands ---" local missing_cmds=() for cmd in "${REQUIRED_CMDS[@]}"; do if ! command -v "$cmd" &> /dev/null; then missing_cmds+=("$cmd") fi done if [ ${#missing_cmds[@]} -ne 0 ]; then echo "ERROR: The following required commands are not found:" >&2 printf " - %s\n" "${missing_cmds[@]}" >&2 echo "Please install them and try again." >&2 exit 1 fi echo "All required commands found." } # Function to get the base block device for a given path (handles partitions, links, etc.) get_base_device() { local path="$1" local resolved_path resolved_path=$(realpath "$path") || { echo "ERROR: Cannot resolve path '$path'" >&2; return 1; } # lsblk -no pkname gets the parent kernel name (base device) lsblk -no pkname "$resolved_path" || { echo "ERROR: Cannot find base device for '$resolved_path' using lsblk." >&2; return 1; } } # --- Script Start --- echo "-----------------------------------------------------" echo "--- USB Drive Partitioning and Formatting Script ---" echo "--- (Version 2 - Enhanced Safety) ---" echo "-----------------------------------------------------" echo "" echo "WARNING: This script is DESTRUCTIVE and will ERASE" echo " ALL DATA on the target device." echo "" # --- Check for Root Privileges --- if [ "$(id -u)" -ne 0 ]; then echo "ERROR: This script must be run as root (e.g., using sudo)." >&2 exit 1 fi # --- Check Dependencies --- check_dependencies # --- Check for Device Argument --- if [ -z "${1:-}" ]; then echo "Usage: $0 /dev/sdX" echo "ERROR: Please provide the target block device (e.g., /dev/sda, /dev/sdb)." >&2 echo "" echo "Available block devices (excluding ROM, loop, and RAM devices):" lsblk -d -o NAME,SIZE,TYPE,MODEL | grep -vE 'rom|loop|ram' exit 1 fi DEVICE="$1" # --- Validate Device --- if [ ! -b "$DEVICE" ]; then echo "ERROR: '$DEVICE' is not a valid block device." >&2 exit 1 fi # --- CRITICAL SAFETY CHECK: Prevent targeting the root filesystem device --- echo "--- Performing safety checks ---" ROOT_DEV_PATH=$(findmnt -n -o SOURCE /) ROOT_BASE_DEV_NAME=$(get_base_device "$ROOT_DEV_PATH") || exit 1 # Exit if function fails TARGET_BASE_DEV_NAME=$(get_base_device "$DEVICE") || exit 1 # Construct full device paths for comparison ROOT_BASE_DEV="/dev/${ROOT_BASE_DEV_NAME}" TARGET_BASE_DEV="/dev/${TARGET_BASE_DEV_NAME}" # Assumes the input $DEVICE is the base device if [ "$TARGET_BASE_DEV" == "$ROOT_BASE_DEV" ]; then echo "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" >&2 echo "FATAL ERROR: Target device '$DEVICE' appears to be the same" >&2 echo " device ('$ROOT_BASE_DEV') as the running root" >&2 echo " filesystem ('$ROOT_DEV_PATH')." >&2 echo " Aborting to prevent data loss." >&2 echo "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" >&2 exit 1 fi echo "Safety check passed: Target device '$DEVICE' is not the root filesystem device ('$ROOT_BASE_DEV')." # Check if the device looks like an SD card reader often used for the OS drive if [[ "$DEVICE" == /dev/mmcblk* ]]; then echo "WARNING: '$DEVICE' looks like an SD card (e.g., /dev/mmcblk0)." echo " Double-check this is not your primary OS drive!" fi # --- Confirmation --- echo "" echo "Target Device: $DEVICE" echo "Partitions to be created:" echo " 1: EFI System Partition (ESP), FAT32, Label: '$ESP_LABEL', Size: $ESP_SIZE, Flags: boot, esp" echo " 2: Linux Data Partition, ext4, Label: '$DATA_LABEL', Size: Remaining space" echo "" read -p "ARE YOU ABSOLUTELY SURE you want to erase '$DEVICE' and proceed? (yes/NO): " CONFIRMATION CONFIRMATION=${CONFIRMATION:-NO} # Default to NO if user just presses Enter if [[ "$CONFIRMATION" != "yes" ]]; then echo "Operation cancelled by user." exit 0 fi echo "" echo "--- Proceeding with operations on $DEVICE ---" # --- Phase 2: Partition the USB Drive --- # 1. Unmount Existing Partitions echo "" echo "--- Unmounting any existing partitions on ${DEVICE}* ---" # Use findmnt to get mount points and umount them safely # Also try to unmount the base device itself in case it's loop-mounted etc. findmnt -n -o TARGET --source "${DEVICE}*" | xargs --no-run-if-empty umount -v -l || echo "Info: No partitions were mounted or umount failed (might be okay)." umount "$DEVICE" &>/dev/null || true # Attempt to unmount base device, ignore errors sleep 1 # Give time for umount to settle lsblk "$DEVICE" # 2. Wipe Existing Signatures (Recommended) echo "" echo "--- Wiping filesystem/partition signatures from $DEVICE ---" wipefs --all --force "$DEVICE" sync # Flush kernel buffers to disk to ensure changes are physically written # 3. Create New GPT Partition Table echo "" echo "--- Creating new GPT partition table on $DEVICE ---" parted "$DEVICE" --script -- mklabel gpt sync # Flush kernel buffers to disk # 4. Create EFI System Partition (ESP) echo "" echo "--- Creating ESP partition (1) on $DEVICE ---" parted "$DEVICE" --script -- mkpart "${ESP_LABEL}" fat32 1MiB "${ESP_SIZE}" parted "$DEVICE" --script -- set 1 boot on parted "$DEVICE" --script -- set 1 esp on sync # Flush kernel buffers to disk # 5. Create Linux Data Partition echo "" echo "--- Creating Linux data partition (2) on $DEVICE ---" # Use the end of the ESP as the start for the data partition parted "$DEVICE" --script -- mkpart "${DATA_LABEL}" ext4 "${ESP_SIZE}" 100% sync # Flush kernel buffers to disk echo "Waiting briefly for kernel to recognize new partitions..." sleep 2 # Define partition variables (assuming standard naming, e.g., /dev/sda1, /dev/sda2) # Adding 'p' for NVMe devices (e.g., /dev/nvme0n1p1) - check if base device name contains 'nvme' if [[ "$DEVICE" == *nvme* ]]; then PART_PREFIX="p" else PART_PREFIX="" fi ESP_PARTITION="${DEVICE}${PART_PREFIX}1" DATA_PARTITION="${DEVICE}${PART_PREFIX}2" # Check if partition devices exist, retry with partprobe if needed echo "--- Checking for partition device nodes (${ESP_PARTITION}, ${DATA_PARTITION}) ---" PARTITIONS_FOUND=false for i in {1..5}; do if [ -b "$ESP_PARTITION" ] && [ -b "$DATA_PARTITION" ]; then echo "Partition nodes found." PARTITIONS_FOUND=true break fi echo "Partition nodes not yet found. Retrying probe (Attempt $i/5)..." partprobe "$DEVICE" || echo "Warning: partprobe command failed, continuing check..." sleep 1 done if [ "$PARTITIONS_FOUND" = false ]; then echo "ERROR: Partition devices ($ESP_PARTITION, $DATA_PARTITION) not found after partitioning and retries." >&2 echo " Please check manually ('lsblk $DEVICE', 'parted $DEVICE print')." >&2 lsblk "$DEVICE" exit 1 fi # 6. Verify Partitioning echo "" echo "--- Verifying partitions on $DEVICE ---" parted "$DEVICE" --script -- print echo "" echo "--- Block device view: ---" lsblk -o NAME,SIZE,TYPE,FSTYPE,PARTLABEL,MOUNTPOINT,PARTFLAGS "$DEVICE" echo "----------------------------" echo "Expected: ${ESP_PARTITION} (~${ESP_SIZE}), Type EFI System, Flags: boot, esp" echo "Expected: ${DATA_PARTITION} (Remaining size), Type Linux filesystem" echo "----------------------------" sleep 2 # Pause for user to review # --- Phase 3: Format Filesystems --- # 1. Format EFI Partition echo "" echo "--- Formatting ESP partition (${ESP_PARTITION}) as FAT32 with label '${ESP_LABEL}' ---" mkfs.vfat -F 32 -n "${ESP_LABEL}" "${ESP_PARTITION}" sync # Flush kernel buffers to disk # Check filesystem integrity echo "--- Checking ESP filesystem (fsck.vfat) ---" FSCK_VFAT_EXIT_CODE=0 fsck.vfat -a "${ESP_PARTITION}" || FSCK_VFAT_EXIT_CODE=$? # Run fsck, capture exit code on failure if [ $FSCK_VFAT_EXIT_CODE -eq 0 ]; then echo "ESP filesystem check passed (or no check performed)." elif [ $FSCK_VFAT_EXIT_CODE -eq 1 ]; then # Exit code 1 usually means errors were found AND corrected. echo "WARNING: fsck.vfat found and corrected errors on ESP partition (${ESP_PARTITION}). Check output above." else # Exit codes > 1 typically indicate uncorrected errors. echo "ERROR: fsck.vfat reported uncorrectable errors (Exit Code: $FSCK_VFAT_EXIT_CODE) on ESP partition (${ESP_PARTITION})." >&2 echo " Cannot proceed safely. Please investigate manually." >&2 exit 1 fi # Verify label using blkid echo "--- Verifying ESP label ---" if blkid -s LABEL -o value "${ESP_PARTITION}" | grep -q "^${ESP_LABEL}$"; then echo "ESP Label '${ESP_LABEL}' verified successfully on ${ESP_PARTITION}." else echo "ERROR: Failed to verify ESP Label '${ESP_LABEL}' on ${ESP_PARTITION}." >&2 blkid "${ESP_PARTITION}" # Show full blkid output for debugging exit 1 fi # 2. Format Data Partition echo "" echo "--- Formatting Data partition (${DATA_PARTITION}) as ext4 with label '${DATA_LABEL}' ---" mkfs.ext4 -m 0 -L "${DATA_LABEL}" "${DATA_PARTITION}" sync # Flush kernel buffers to disk # Check the new ext4 filesystem integrity echo "--- Checking Data partition filesystem (e2fsck) ---" # -f forces check even if clean, -y assumes yes to all prompts (use with caution) E2FSCK_EXIT_CODE=0 e2fsck -f -y "${DATA_PARTITION}" || E2FSCK_EXIT_CODE=$? # Capture exit code on failure if [ $E2FSCK_EXIT_CODE -eq 0 ]; then echo "Data partition filesystem check passed." elif [ $E2FSCK_EXIT_CODE -eq 1 ]; then # Exit code 1 means errors were corrected. echo "WARNING: e2fsck found and corrected errors on Data partition (${DATA_PARTITION}). Check output above." else # Exit codes > 1 indicate uncorrected errors. echo "ERROR: e2fsck reported uncorrectable errors (Exit Code: $E2FSCK_EXIT_CODE) on Data partition (${DATA_PARTITION})." >&2 echo " Cannot proceed safely. Please investigate manually." >&2 exit 1 fi # Verify the label using blkid echo "--- Verifying Data partition label ---" if blkid -s LABEL -o value "${DATA_PARTITION}" | grep -q "^${DATA_LABEL}$"; then echo "Data Label '${DATA_LABEL}' verified successfully on ${DATA_PARTITION}." else echo "ERROR: Failed to verify Data Label '${DATA_LABEL}' on ${DATA_PARTITION}." >&2 blkid "${DATA_PARTITION}" # Show full blkid output for debugging exit 1 fi echo "" echo "-----------------------------------------------------" echo "--- Script finished successfully! ---" echo "Device: $DEVICE" echo "Partitions created and formatted:" lsblk -o NAME,SIZE,TYPE,FSTYPE,LABEL,PARTLABEL,MOUNTPOINT "$DEVICE" echo "-----------------------------------------------------" exit 0

#Phase 3: Install Sbnb Boot Files and Configuration

  1. Mount EFI Partition: Access the ESP filesystem. Replace /dev/sdX1 with the actual ESP partition device name identified earlier.
    echo "--- Mounting ESP partition ---" sudo mkdir -p /mnt/sbnb-mount sudo mount /dev/sdX1 /mnt/sbnb-mount
  2. Create EFI Boot Directory: Standard UEFI fallback path.
    echo "--- Creating EFI boot directories ---" sudo mkdir -p /mnt/sbnb-mount/EFI/BOOT
  3. Copy Sbnb EFI Boot File: Place the bootloader (sbnb.efi as BOOTX64.EFI). Replace /path/to/your/sbnb.efi with the actual path to the file you obtained.
    echo "--- Copying Sbnb EFI boot file ---" sudo cp /path/to/your/sbnb.efi /mnt/sbnb-mount/EFI/BOOT/BOOTX64.EFI
  4. Run Sbnb Configuration python script: Mount /dev/sdX1 to /mnt/sbnb and /dev/sdX2 to /mnt/sbnb-data. Replace tskey-auth-... with your actual Tailscale auth key on this python script:
#!/usr/bin/env python3 """ Unified SBNB Configuration Deployment Script (Version 2.1 - BusyBox cp focus). Generates configuration files and scripts to: - Mount a persistent data partition. - Configure Docker to use a persistent data-root on that partition. - Optionally migrate existing Docker data from /var/lib/docker robustly using busybox cp. - Set up backup/purge routines for the persistent Docker data. - Set up health and volume monitoring for Docker (with safer defaults). - Deploy a Tailscale authentication key. - Deploy an optional development environment script. Core components generated: - /mnt/sbnb/sbnb-cmds.sh: Main boot script executed by the system. - /mnt/sbnb/sbnb-tskey.txt: Tailscale authentication key. - /mnt/sbnb-data/scripts/*: Helper scripts for backup, purge, health checks. - /mnt/sbnb-data/systemd/*: Systemd units to automate helper scripts. Prerequisites: - Run as root. - ESP partition mounted at /mnt/sbnb (writable). - Data partition mounted at /mnt/sbnb-data (writable). - Required: Standard Linux utilities (coreutils including 'cp', systemd, grep, sed, etc.). - Recommended: `jq` installed on the target system for robust JSON handling. """ import os import stat import sys import pathlib import json import shutil from datetime import datetime # --- Configuration: File Paths --- # Base mount points - Script will check if these exist and are writable ESP_MOUNT = "/mnt/sbnb" DATA_MOUNT = "/mnt/sbnb-data" # Docker configuration PERSISTENT_DOCKER_ROOT = f"{DATA_MOUNT}/docker-root" DOCKER_CONFIG_DIR = "/etc/docker" DOCKER_CONFIG_FILE = f"{DOCKER_CONFIG_DIR}/daemon.json" DOCKER_CONFIG_BACKUP_SUFFIX = ".sbnb-orig-backup" # Suffix for one-time backup DOCKER_DATA_EPHEMERAL = "/var/lib/docker" # Default path for migration check # Permissions for Docker root dir (rwx--x--x). Owner/Group should be root:root. # Use standard integer representation for octal in Python DOCKER_ROOT_PERMISSIONS = 0o711 # Permissions for daemon.json (rw-r--r--) DOCKER_CONFIG_PERMISSIONS = 0o644 # Backup configuration BACKUP_BASE_DIR = f"{DATA_MOUNT}/backups/docker" BACKUP_KEEP_COUNT = 3 # Number of backups to retain STOP_DOCKER_FOR_BACKUP = 1 # 1 = Stop Docker during backup (safer), 0 = Attempt live backup # Permissions for backup base directory (rwxr-x---) BACKUP_DIR_PERMISSIONS = 0o750 # Health Check configuration VOLUME_CHECK_THRESHOLD_PERCENT = 10 # Warn if free space drops below this % # Pruning level in volume check: 0=None, 1=Containers/Dangling Images, 2=All Unused Images+Containers (--volumes still excluded) VOLUME_CHECK_PRUNE_LEVEL = 1 # --- Content Definitions --- # --- sbnb-cmds.sh Content --- # REFACTOR: Removed rsync checks and usage, standardized on cp -a -u for migration. # REFACTOR: Use correct octal format specifier (:o) for mkdir -m and chmod. SBNB_CMDS_SH_CONTENT = f"""#!/bin/sh # Sbnb Custom Commands Script (Unified Persistent Docker Root + Features - v2.1 - BusyBox cp) # Mounts persistent data, configures Docker data-root, migrates data (if needed using cp), # updates optional scripts, enables systemd units for backup & monitoring. # Strict error handling set -e -o pipefail -u # --- Logging Function --- log() {{ # Log to kernel message buffer echo "[sbnb-cmds.sh] $1" > /dev/kmsg }} log "Starting custom boot commands (Unified Persistent Docker Root v2.1 - BusyBox cp)..." # --- Check Core Commands --- # Ensure essential commands for this script are present check_cmds() {{ local missing_cmd=0 log "Checking required commands..." for cmd in "$@"; do if ! command -v "$cmd" >/dev/null 2>&1; then log "ERROR: Required command '$cmd' not found." missing_cmd=1 fi done if [ $missing_cmd -eq 1 ]; then log "ERROR: Missing one or more required commands. Cannot proceed." exit 1 fi log "Required commands found." # Check optional but recommended commands if ! command -v jq >/dev/null 2>&1; then log "WARNING: 'jq' command not found. JSON handling for daemon.json will be less robust and may fail on complex existing files." else log "OK: 'jq' command found (recommended)." fi # Note: rsync check removed as cp -a -u is now the standard method }} # Define all commands potentially used in this script # Removed 'rsync' from the list. check_cmds mountpoint readlink mkdir mount echo sleep rm find ln systemctl mktemp cp mv chmod chown dirname basename jq grep cat cmp date sed ls # --- Mount Persistent Data Partition --- DATA_LABEL="SBNB_DATA" DATA_DEVICE_SYMLINK="/dev/disk/by-label/${{DATA_LABEL}}" DATA_MOUNT_POINT="{DATA_MOUNT}" MAX_WAIT_SECONDS=15 WAIT_INTERVAL=1 elapsed_time=0 log "Waiting up to ${{MAX_WAIT_SECONDS}}s for data device (Label: ${{DATA_LABEL}})..." while [ ! -e "${{DATA_DEVICE_SYMLINK}}" ]; do if [ ${{elapsed_time}} -ge ${{MAX_WAIT_SECONDS}} ]; then log "ERROR: Timeout waiting for device ${{DATA_DEVICE_SYMLINK}}. Persistent data cannot be mounted." exit 1 fi sleep ${{WAIT_INTERVAL}} elapsed_time=$((elapsed_time + WAIT_INTERVAL)) done DATA_DEVICE=$(readlink -f "${{DATA_DEVICE_SYMLINK}}") log "Data partition device resolved to ${{DATA_DEVICE}} after ${{elapsed_time}}s." # Ensure mount point directory exists mkdir -p "${{DATA_MOUNT_POINT}}" log "Attempting to mount ${{DATA_DEVICE}} at ${{DATA_MOUNT_POINT}}..." if ! mountpoint -q "${{DATA_MOUNT_POINT}}"; then # Attempt to mount read-write, noatime, nodiratime if mount -o rw,noatime,nodiratime "${{DATA_DEVICE}}" "${{DATA_MOUNT_POINT}}"; then log "Successfully mounted persistent partition at ${{DATA_MOUNT_POINT}}." else log "ERROR: Failed to mount ${{DATA_DEVICE}} at ${{DATA_MOUNT_POINT}}! Check filesystem and device." exit 1 fi else log "Persistent partition already mounted at ${{DATA_MOUNT_POINT}}. Ensuring read-write..." # Ensure partition is mounted read-write mount -o remount,rw "${{DATA_MOUNT_POINT}}" || {{ log "ERROR: Failed to remount ${{DATA_MOUNT_POINT}} as read-write! Docker requires write access." exit 1 }} fi # --- Configure Docker to use Persistent Data Directory --- log "Setting up Docker to use persistent data-root..." PERSISTENT_DOCKER_ROOT="{PERSISTENT_DOCKER_ROOT}" DOCKER_CONFIG_DIR="{DOCKER_CONFIG_DIR}" DOCKER_CONFIG_FILE="{DOCKER_CONFIG_FILE}" DOCKER_CONFIG_BACKUP="{DOCKER_CONFIG_FILE}{DOCKER_CONFIG_BACKUP_SUFFIX}" DOCKER_DATA_EPHEMERAL="{DOCKER_DATA_EPHEMERAL}" # For migration check CONFIG_CHANGED=0 # Flag to track if we need to restart docker # 1. Ensure the persistent Docker data-root directory exists with correct owner/permissions log "Ensuring persistent Docker data directory exists: ${{PERSISTENT_DOCKER_ROOT}}" # Create with specific permissions (rwx--x--x) using correct octal format for command line mkdir -p -m {DOCKER_ROOT_PERMISSIONS:o} "${{PERSISTENT_DOCKER_ROOT}}" if [ ! -d "${{PERSISTENT_DOCKER_ROOT}}" ]; then log "ERROR: Failed to create persistent Docker data directory ${{PERSISTENT_DOCKER_ROOT}}!" exit 1 fi # Ensure ownership is root:root (critical for Docker) log "Ensuring ownership of ${{PERSISTENT_DOCKER_ROOT}} is root:root..." chown root:root "${{PERSISTENT_DOCKER_ROOT}}" || log "WARNING: Failed to set ownership on ${{PERSISTENT_DOCKER_ROOT}}. Docker might have issues." # Ensure permissions are correct (mkdir -p doesn't always set mode on existing dirs) using correct octal format for command line log "Ensuring permissions of ${{PERSISTENT_DOCKER_ROOT}} are {DOCKER_ROOT_PERMISSIONS:o}..." chmod {DOCKER_ROOT_PERMISSIONS:o} "${{PERSISTENT_DOCKER_ROOT}}" || log "WARNING: Failed to set permissions on ${{PERSISTENT_DOCKER_ROOT}}." log "Persistent Docker data directory ensured." # 2. Create/Update Docker daemon configuration (/etc/docker/daemon.json) log "Configuring Docker daemon (${{DOCKER_CONFIG_FILE}}) to use data-root: ${{PERSISTENT_DOCKER_ROOT}}" mkdir -p "${{DOCKER_CONFIG_DIR}}" # Ensure config directory exists # Backup original config ONCE if it exists and backup doesn't if [ -f "${{DOCKER_CONFIG_FILE}}" ] && [ ! -f "${{DOCKER_CONFIG_BACKUP}}" ]; then log "Backing up original Docker config to ${{DOCKER_CONFIG_BACKUP}}..." cp -a "${{DOCKER_CONFIG_FILE}}" "${{DOCKER_CONFIG_BACKUP}}" || \\ log "WARNING: Failed to create backup of ${{DOCKER_CONFIG_FILE}}." fi # --- Safely update daemon.json --- NEEDS_UPDATE=0 # Use jq if available (preferred method) if command -v jq >/dev/null 2>&1; then log "Using jq to manage daemon.json." # Ensure file exists with at least {{}} for jq processing [ -f "${{DOCKER_CONFIG_FILE}}" ] || echo "{{}}" > "${{DOCKER_CONFIG_FILE}}" # Read current value safely, defaulting to empty string if null or missing current_data_root=$(jq -r '.["data-root"] // ""' "${{DOCKER_CONFIG_FILE}}") if [ "$current_data_root" != "${{PERSISTENT_DOCKER_ROOT}}" ]; then log "Data-root needs update (jq check). Preparing changes..." NEEDS_UPDATE=1 else log "Docker data-root already correctly set in daemon.json (jq check)." fi if [ $NEEDS_UPDATE -eq 1 ]; then TMP_JSON=$(mktemp "${{DOCKER_CONFIG_DIR}}/daemon.json.tmp.XXXXXX") log "Attempting to merge data-root setting using jq..." # Merge the new data-root value, preserving other keys if jq --arg path "${{PERSISTENT_DOCKER_ROOT}}" '. + {{"data-root": $path}}' "${{DOCKER_CONFIG_FILE}}" > "${{TMP_JSON}}"; then # Check if jq produced valid JSON if jq -e . "${{TMP_JSON}}" > /dev/null; then # Check if content actually changed before moving if ! cmp -s "${{TMP_JSON}}" "${{DOCKER_CONFIG_FILE}}"; then mv "${{TMP_JSON}}" "${{DOCKER_CONFIG_FILE}}" chmod {DOCKER_CONFIG_PERMISSIONS:o} "${{DOCKER_CONFIG_FILE}}" # Use correct format here too for consistency, though 644 doesn't need '0o' prefix log "Successfully updated daemon.json using jq." CONFIG_CHANGED=1 else log "daemon.json content unchanged after jq merge, removing temp file." rm -f "${{TMP_JSON}}" fi else log "ERROR: jq produced invalid JSON output. Config not updated." rm -f "${{TMP_JSON}}" # Clean up temp file fi else jq_exit_code=$? log "ERROR: jq command failed (exit code $jq_exit_code) while updating config. Config not updated." # Optionally capture and log jq stderr here if needed rm -f "${{TMP_JSON}}" # Clean up temp file fi fi # Fallback logic if jq is NOT available else log "WARNING: jq not found. Using less robust fallback for daemon.json." # Define the minimal target content TARGET_JSON_CONTENT=$(printf '{{%s\\n "data-root": "%s"%s\\n}}%s\\n' "" "${{PERSISTENT_DOCKER_ROOT}}" "" "") if [ ! -f "${{DOCKER_CONFIG_FILE}}" ]; then log "daemon.json does not exist. Creating new file with data-root." NEEDS_UPDATE=1 else # Check if data-root key exists at all if ! grep -q '"data-root"\\s*:' "${{DOCKER_CONFIG_FILE}}"; then log "Existing daemon.json lacks 'data-root' key." # Check if the file is simple (e.g., just {{}} or empty/whitespace) if ! grep -q '[a-zA-Z0-9]' "${{DOCKER_CONFIG_FILE}}" || grep -q '^\\s*{{\\s*}}\\s*$' "${{DOCKER_CONFIG_FILE}}"; then log "Existing file is simple, overwriting with data-root." NEEDS_UPDATE=1 else log "ERROR: Existing daemon.json is complex and lacks 'data-root'. Cannot safely update without jq. Install jq or manually edit." # Do not proceed with overwrite NEEDS_UPDATE=0 # Explicitly prevent update fi # Key exists, check if the value is correct (basic check) elif ! grep -q '"data-root"\\s*:\\s*"${{PERSISTENT_DOCKER_ROOT}}"' "${{DOCKER_CONFIG_FILE}}"; then log "ERROR: Existing daemon.json has 'data-root' but points elsewhere. Cannot safely update without jq. Install jq or manually edit." # Do not proceed with overwrite NEEDS_UPDATE=0 # Explicitly prevent update else log "daemon.json exists and data-root seems correct (grep check)." NEEDS_UPDATE=0 fi fi # Perform write only if deemed safe and necessary by the logic above if [ $NEEDS_UPDATE -eq 1 ]; then log "Writing daemon.json (simple method)..." TMP_JSON=$(mktemp "${{DOCKER_CONFIG_DIR}}/daemon.json.tmp.XXXXXX") echo "$TARGET_JSON_CONTENT" > "${{TMP_JSON}}" if [ $? -eq 0 ]; then mv "${{TMP_JSON}}" "${{DOCKER_CONFIG_FILE}}" chmod {DOCKER_CONFIG_PERMISSIONS:o} "${{DOCKER_CONFIG_FILE}}" # Use correct format here too log "Successfully wrote simple daemon.json." CONFIG_CHANGED=1 else log "ERROR: Failed to write temporary simple daemon.json! Config not updated." rm -f "${{TMP_JSON}}" fi fi fi log "Docker daemon configuration check finished." # 3. Data Migration (Optional): Migrate data from ephemeral location if needed log "Checking for existing Docker data in ephemeral location (${{DOCKER_DATA_EPHEMERAL}})..." # Check if the directory exists and contains anything other than 'lost+found' or potential marker files if [ -d "${{DOCKER_DATA_EPHEMERAL}}" ] && [ -n "$(ls -A "${{DOCKER_DATA_EPHEMERAL}}" | grep -v -e '^lost+found$' -e '^\\.sbnb_persistent_redirect$' -e '^README_DO_NOT_USE\\.txt$' 2>/dev/null)" ]; then log "Found potentially significant data in ${{DOCKER_DATA_EPHEMERAL}}." # Check if persistent location is effectively empty (allowing only lost+found) persistent_is_empty=0 if [ ! "$(ls -A "${{PERSISTENT_DOCKER_ROOT}}" | grep -v '^lost+found$' 2>/dev/null)" ]; then persistent_is_empty=1 fi if [ $persistent_is_empty -eq 1 ]; then log "Persistent location ${{PERSISTENT_DOCKER_ROOT}} is empty. Migrating data..." # Ensure Docker is stopped before migration if systemctl is-active --quiet docker; then log "Stopping Docker service for migration..." systemctl stop docker || log "WARNING: Failed to stop Docker. Migration proceeding, but data might be inconsistent!" sleep 3 # Give it time to release files fi log "Starting migration using cp -a -u..." MIGRATION_SUCCESS=0 # Use cp -a -u: archive mode (preserve attrs), update mode (copy only if newer/missing). # Source ends with /. to copy contents including hidden files. # This is the recommended busybox alternative to rsync for local mirroring. if cp -a -u "${{DOCKER_DATA_EPHEMERAL}}/." "${{PERSISTENT_DOCKER_ROOT}}/"; then MIGRATION_SUCCESS=1 else log "ERROR: cp -a -u migration failed with exit code $? !" fi # Handle migration outcome if [ $MIGRATION_SUCCESS -eq 1 ]; then log "Migration completed successfully." # Rename old data directory as backup OLD_DATA_BACKUP="${{DOCKER_DATA_EPHEMERAL}}.migrated.$(date +%Y%m%d_%H%M%S).bak" log "Attempting to rename old data directory to ${{OLD_DATA_BACKUP}}..." # Use mv -T to handle if ephemeral is somehow a symlink if mv -T "${{DOCKER_DATA_EPHEMERAL}}" "${{OLD_DATA_BACKUP}}"; then log "Successfully renamed old data directory." else log "WARNING: Could not rename old data directory ${{DOCKER_DATA_EPHEMERAL}}. It may still contain data." # Consider rm -rf here ONLY if migration verification was very thorough, otherwise leave it. fi # Mark that Docker needs restart due to migration CONFIG_CHANGED=1 else log "ERROR: Data migration failed! Docker data may be incomplete or inconsistent in ${{PERSISTENT_DOCKER_ROOT}}." # Exiting is likely the safest option here to force manual review. exit 1 fi else log "Persistent location ${{PERSISTENT_DOCKER_ROOT}} already contains data. Skipping migration." # Optionally rename the ephemeral data if it still exists and is unwanted OLD_DATA_BACKUP="${{DOCKER_DATA_EPHEMERAL}}.ignored.$(date +%Y%m%d_%H%M%S).bak" log "Attempting to rename unused ephemeral data directory to ${{OLD_DATA_BACKUP}}..." mv -T "${{DOCKER_DATA_EPHEMERAL}}" "${{OLD_DATA_BACKUP}}" || \\ log "WARNING: Could not rename ephemeral data directory ${{DOCKER_DATA_EPHEMERAL}}." fi else log "No significant data found in ephemeral location ${{DOCKER_DATA_EPHEMERAL}}. No migration needed." fi # Ensure the original ephemeral directory path exists but is empty, with a marker log "Ensuring ephemeral path ${{DOCKER_DATA_EPHEMERAL}} exists and is marked as unused." # Remove original path if it still exists (e.g., if rename failed but we continued) if [ -d "${{DOCKER_DATA_EPHEMERAL}}" ]; then rm -rf "${{DOCKER_DATA_EPHEMERAL}}" || log "WARNING: Failed to remove original ephemeral directory after processing." fi mkdir -p "${{DOCKER_DATA_EPHEMERAL}}" touch "${{DOCKER_DATA_EPHEMERAL}}/.sbnb_persistent_redirect" echo "Docker data is managed at ${{PERSISTENT_DOCKER_ROOT}}. This directory should remain empty." > "${{DOCKER_DATA_EPHEMERAL}}/README_DO_NOT_USE.txt" chmod 644 "${{DOCKER_DATA_EPHEMERAL}}/README_DO_NOT_USE.txt" # 644 doesn't need :o format chmod 600 "${{DOCKER_DATA_EPHEMERAL}}/.sbnb_persistent_redirect" # 600 doesn't need :o format log "Data migration check finished." # 4. Restart Docker Service *if* configuration was changed OR migration occurred if [ $CONFIG_CHANGED -eq 1 ]; then log "Configuration or data migration requires Docker restart. Reloading daemon and restarting service..." if ! systemctl daemon-reload; then log "ERROR: Failed to reload systemd daemon! Docker restart might fail or use old config." exit 1 # Critical failure if daemon cannot reload fi log "Attempting to restart docker.service..." if systemctl restart docker.service; then log "Docker service restarted successfully." else log "ERROR: Failed to restart Docker service! Check 'journalctl -u docker.service'." exit 1 # Critical failure if Docker doesn't restart after config change/migration fi else log "No configuration changes or migration. Docker restart not required by this script." # Optional: Ensure Docker is running even if no changes occurred # log "Ensuring Docker service is active..." # if ! systemctl is-active --quiet docker.service; then # log "Docker service is not active. Attempting to start..." # systemctl start docker.service || log "WARNING: Failed to start inactive Docker service." # fi fi log "Docker setup finished." # --- Update Optional Development Environment Script --- # (Using the robust atomic update logic) TARGET_DEV_ENV_SCRIPT="/usr/sbin/sbnb-dev-env.sh" SOURCE_DEV_ENV_SCRIPT="${{DATA_MOUNT_POINT}}/scripts/sbnb-dev-env.sh" # Assuming it's stored persistently log "Checking for optional development script update: ${{SOURCE_DEV_ENV_SCRIPT}}" if [ -f "${{SOURCE_DEV_ENV_SCRIPT}}" ] && [ -r "${{SOURCE_DEV_ENV_SCRIPT}}" ]; then log "Source script found. Attempting atomic update of ${{TARGET_DEV_ENV_SCRIPT}}..." TARGET_DIR=$(dirname "${{TARGET_DEV_ENV_SCRIPT}}") TMP_SCRIPT="" # Setup trap for cleanup trap 'sbnb_dev_cleanup' EXIT HUP INT QUIT TERM sbnb_dev_cleanup() {{ if [ -n "${{TMP_SCRIPT:-}}" ] && [ -f "${{TMP_SCRIPT}}" ]; then rm -f "${{TMP_SCRIPT}}" log "Cleaned up temporary file ${{TMP_SCRIPT}}" fi trap - EXIT HUP INT QUIT TERM # Reset trap }} if [ ! -d "${{TARGET_DIR}}" ] || [ ! -w "${{TARGET_DIR}}" ]; then log "WARNING: Target directory ${{TARGET_DIR}} does not exist or is not writable. Cannot update script." # Check required commands exist (already done by check_cmds, but good practice here too) elif ! command -v mktemp >/dev/null 2>&1 || ! command -v cp >/dev/null 2>&1 || ! command -v chmod >/dev/null 2>&1 || ! command -v mv >/dev/null 2>&1; then log "WARNING: Required command (mktemp/cp/chmod/mv) not found. Skipping update." else TMP_SCRIPT=$(mktemp "${{TARGET_DIR}}/sbnb-dev-env.sh.XXXXXX") if [ -z "${{TMP_SCRIPT}}" ] || [ ! -f "${{TMP_SCRIPT}}" ]; then log "WARNING: Failed to create temporary file in ${{TARGET_DIR}}. Skipping update." TMP_SCRIPT="" # Prevent trap from trying to remove nothing else # Proceed with copy, chmod, move if cp "${{SOURCE_DEV_ENV_SCRIPT}}" "${{TMP_SCRIPT}}"; then if chmod +x "${{TMP_SCRIPT}}"; then # Use mv -T to handle target being a symlink correctly if mv -T "${{TMP_SCRIPT}}" "${{TARGET_DEV_ENV_SCRIPT}}"; then log "Successfully updated ${{TARGET_DEV_ENV_SCRIPT}}." TMP_SCRIPT="" # Clear var so trap doesn't remove the final script else log "WARNING: Failed to move temporary file ${{TMP_SCRIPT}} to ${{TARGET_DEV_ENV_SCRIPT}}. Update failed."; fi else log "WARNING: Failed to set execute permissions on temporary file ${{TMP_SCRIPT}}. Update failed."; fi else log "WARNING: Failed to copy content from ${{SOURCE_DEV_ENV_SCRIPT}} to ${{TMP_SCRIPT}}. Update failed."; fi fi # Clean up temp file if it still exists (e.g., on mv failure) and TMP_SCRIPT is set if [ -n "${{TMP_SCRIPT:-}}" ] && [ -f "${{TMP_SCRIPT}}" ]; then rm -f "${{TMP_SCRIPT}}"; fi TMP_SCRIPT="" # Ensure trap doesn't run again for this fi trap - EXIT HUP INT QUIT TERM # Clear trap explicitly else log "NOTE: Source script ${{SOURCE_DEV_ENV_SCRIPT}} not found or not readable. Skipping update." fi log "Update of optional script finished." # --- Enable Systemd Units (Backup/Purge + Health/Volume Checks) --- SYSTEMD_SOURCE_DIR="${{DATA_MOUNT_POINT}}/systemd" SYSTEMD_TARGET_DIR="/etc/systemd/system" TIMERS_WANTS_DIR="${{SYSTEMD_TARGET_DIR}}/timers.target.wants" log "Enabling custom systemd units (Source: ${{SYSTEMD_SOURCE_DIR}})..." if [ -d "${{SYSTEMD_SOURCE_DIR}}" ] && [ -r "${{SYSTEMD_SOURCE_DIR}}" ]; then mkdir -p "${{SYSTEMD_TARGET_DIR}}" mkdir -p "${{TIMERS_WANTS_DIR}}" # Check ln and systemctl exist (already done in check_cmds) linked_any=0 log "Linking systemd unit files..." # Use find with -print0 and read -d '' for safe filename handling find "${{SYSTEMD_SOURCE_DIR}}" -maxdepth 1 -type f \\( -name '*.service' -o -name '*.timer' \\) -print0 | while IFS= read -r -d '' source_unit; do unit_name=$(basename "${{source_unit}}") target_link="${{SYSTEMD_TARGET_DIR}}/${{unit_name}}" log " Linking ${{unit_name}}..." # Use ln -sf: symbolic, force overwrite if link exists if ln -sf "${{source_unit}}" "${{target_link}}"; then linked_any=1 else log " WARNING: Failed to link ${{unit_name}}." fi done if [ $linked_any -eq 0 ]; then log "No unit files found in ${{SYSTEMD_SOURCE_DIR}} to link." else log "Reloading systemd daemon after linking units..." # Reload daemon again (might be redundant if Docker restart already did it, but safe) systemctl daemon-reload || log "WARNING: systemctl daemon-reload failed after linking units." log "Enabling systemd timers/services..." enabled_any=0 # Define ALL units expected to be enabled by this script UNITS_TO_ENABLE="docker-backup.timer docker-purge.timer docker-shutdown-backup.service docker-health-check.timer docker-volume-check.timer" final_enabled_list="" # Use 'for unit in $UNITS_TO_ENABLE' which relies on word splitting # shellcheck disable=SC2086 for unit in $UNITS_TO_ENABLE; do # Check if the link exists and points to a file before enabling if [ -L "${{SYSTEMD_TARGET_DIR}}/${{unit}}" ] && [ -f "${{SYSTEMD_TARGET_DIR}}/${{unit}}" ]; then log " Enabling ${{unit}}..." # Use --now to also start timers immediately if desired, otherwise just enable if systemctl enable "${{unit}}"; then enabled_any=1 final_enabled_list="${{final_enabled_list}} ${{unit}}" else log " WARNING: Failed to enable ${{unit}}." fi else log " Skipping enable for ${{unit}} (link missing or broken)." fi done if [ $enabled_any -eq 1 ]; then final_enabled_list=$(echo "${{final_enabled_list}}" | sed 's/^ *//') # Remove leading space log "Systemd units enabled successfully: ${{final_enabled_list}}" else log "No relevant systemd units were successfully enabled." fi fi # end if linked_any else log "WARNING: Systemd source directory ${{SYSTEMD_SOURCE_DIR}} not found or not readable. Cannot enable units." fi log "Systemd unit setup finished." # --- Script Finish Logging --- log "Finished custom boot commands successfully." # Clear trap explicitly trap - EXIT HUP INT QUIT TERM exit 0 """ # --- Tailscale Key --- # !!! REPLACE THIS WITH YOUR ACTUAL KEY !!! SBNB_TSKEY_TXT_CONTENT = "tskey-auth-..." # Placeholder # --- Backup Script --- BACKUP_DOCKER_SH_CONTENT = f"""#!/bin/sh # File: {DATA_MOUNT}/scripts/backup-docker.sh # Backs up the persistent Docker data-root directory. set -e -u # --- Configuration --- DOCKER_DATA_DIR="{PERSISTENT_DOCKER_ROOT}" # Source is PERSISTENT root BACKUP_DIR="{BACKUP_BASE_DIR}" TIMESTAMP=$(date +"%Y%m%d_%H%M%S") BACKUP_FILE="${{BACKUP_DIR}}/docker_backup_${{TIMESTAMP}}.tar.gz" LATEST_LINK="${{BACKUP_DIR}}/docker_latest.tar.gz" STOP_DOCKER={STOP_DOCKER_FOR_BACKUP} # 1=Stop Docker (safer), 0=Live backup log() {{ echo "[backup-docker.sh] $1" > /dev/kmsg; }} # --- Check Commands --- log "Checking required commands..." check_cmds() {{ local missing_cmd=0 for cmd in "$@"; do if ! command -v "$cmd" >/dev/null 2>&1; then log "ERROR: Command '$cmd' not found."; missing_cmd=1; fi done # Exit if any command is missing [ $missing_cmd -eq 1 ] && exit 1 }} # Core commands needed check_cmds date mkdir tar gzip ln mv sleep dirname basename # Check systemctl only if stopping docker is enabled [ $STOP_DOCKER -eq 1 ] && check_cmds systemctl # Check for optional 'nice' command NICE_CMD="" if command -v nice >/dev/null 2>&1; then NICE_CMD="nice -n 19"; log "Using nice for lower tar priority."; fi # --- Main Logic --- log "Starting Docker backup process..." log "Source: ${{DOCKER_DATA_DIR}}" log "Destination: ${{BACKUP_FILE}}" # Ensure backup directory exists and is writable log "Ensuring backup directory exists: ${{BACKUP_DIR}}" mkdir -p "${{BACKUP_DIR}}" # Check write permissions specifically if [ ! -w "${{BACKUP_DIR}}" ]; then log "ERROR: Backup directory not writable: ${{BACKUP_DIR}}"; exit 1; fi # Stop Docker if configured DOCKER_WAS_RUNNING=0 if [ $STOP_DOCKER -eq 1 ]; then log "Attempting to stop Docker service..." if systemctl is-active --quiet docker.service; then DOCKER_WAS_RUNNING=1 log "Docker service is active, stopping..." if systemctl stop docker.service; then log "Docker service stopped. Waiting 5s for files to release..."; sleep 5 else # If stop fails, warn but maybe proceed? Or exit? Exiting might be safer. log "ERROR: Failed to stop Docker service gracefully! Backup might be inconsistent or fail. Aborting." exit 1 # Exit if stop fails, as backup consistency is compromised fi else log "Docker service already stopped." fi fi # Create backup log "Creating backup archive..." if [ -d "${{DOCKER_DATA_DIR}}" ] && [ -r "${{DOCKER_DATA_DIR}}" ]; then PARENT_DIR=$(dirname "${{DOCKER_DATA_DIR}}") SOURCE_BASENAME=$(basename "${{DOCKER_DATA_DIR}}") log "Archiving '${{SOURCE_BASENAME}}' from parent '${{PARENT_DIR}}'..." # Use -C to change directory, archive relative path 'docker-root/...' # Add --warning=no-file-changed to suppress warnings about files changing during read # shellcheck disable=SC2086 # Allow word splitting for $NICE_CMD if ${{NICE_CMD}} tar --warning=no-file-changed -czf "${{BACKUP_FILE}}" -C "${{PARENT_DIR}}" "${{SOURCE_BASENAME}}"; then log "Backup archive created successfully." # Verify backup file exists and is not empty if [ -s "${{BACKUP_FILE}}" ]; then log "Updating latest backup link..." # Atomic symlink update: create temp link, then rename over old one ln -sfT "${{BACKUP_FILE}}" "${{LATEST_LINK}}.tmp" && mv -Tf "${{LATEST_LINK}}.tmp" "${{LATEST_LINK}}" if [ $? -eq 0 ]; then log "Updated latest link to point to ${{BACKUP_FILE}}." else log "WARNING: Failed to update latest backup link." rm -f "${{LATEST_LINK}}.tmp" # Clean up temp link if mv failed fi else log "WARNING: Backup file seems invalid (empty/missing): ${{BACKUP_FILE}}. Removing." rm -f "${{BACKUP_FILE}}" fi else tar_exit_code=$? log "ERROR: tar command failed with exit code ${{tar_exit_code}}! Backup failed." rm -f "${{BACKUP_FILE}}" # Clean up partial archive if tar failed fi else log "WARNING: Docker data directory not found or not readable: ${{DOCKER_DATA_DIR}}. Skipping backup." fi # Restart Docker if it was running and we stopped it successfully if [ $DOCKER_WAS_RUNNING -eq 1 ]; then log "Restarting Docker service..." if ! systemctl start docker.service; then log "WARNING: Failed to restart Docker service after backup." else log "Docker service restarted." fi fi log "Docker backup process finished." exit 0 """ # --- Purge Script --- PURGE_DOCKER_BACKUPS_SH_CONTENT = f"""#!/bin/sh # File: {DATA_MOUNT}/scripts/purge-docker-backups.sh # Removes old Docker backups, keeping the last N. set -e -u BACKUP_DIR="{BACKUP_BASE_DIR}" KEEP_COUNT={BACKUP_KEEP_COUNT} log() {{ echo "[purge-docker-backups.sh] $1" > /dev/kmsg; }} # Check commands check_cmds() {{ local missing_cmd=0 for cmd in "$@"; do if ! command -v "$cmd" >/dev/null 2>&1; then log "ERROR: Command '$cmd' not found."; missing_cmd=1; fi; done [ $missing_cmd -eq 1 ] && exit 1 }} check_cmds find wc sort head cut xargs rm mkdir date log "Purging old Docker backups in ${{BACKUP_DIR}}, keeping ${{KEEP_COUNT}}..." # Validate KEEP_COUNT if ! [ "$KEEP_COUNT" -ge 0 ] 2>/dev/null; then log "ERROR: KEEP_COUNT (${{KEEP_COUNT}}) is invalid."; exit 1; fi # Ensure backup directory exists and is accessible if ! mkdir -p "${{BACKUP_DIR}}"; then log "ERROR: Failed to create backup directory ${{BACKUP_DIR}}!"; exit 1; fi if [ ! -d "${{BACKUP_DIR}}" ] || [ ! -r "${{BACKUP_DIR}}" ] || [ ! -w "${{BACKUP_DIR}}" ]; then log "ERROR: Cannot access backup directory ${{BACKUP_DIR}}!"; exit 1; fi # Count existing backups safely log "Counting existing backup files..." backup_count=$(find "${{BACKUP_DIR}}" -maxdepth 1 -name 'docker_backup_*.tar.gz' -type f -print 2>/dev/null | wc -l) find_exit_code=$? if [ $find_exit_code -ne 0 ]; then log "WARNING: find command failed (${{find_exit_code}}) while counting backups. Skipping purge."; exit 0; fi log "Found ${{backup_count}} backup files." if [ "$backup_count" -gt "$KEEP_COUNT" ]; then to_delete_count=$(( backup_count - KEEP_COUNT )) log "Need to delete ${{to_delete_count}} oldest backup(s)." # Use find -printf with null terminators for safe filename handling log "Identifying oldest backups to delete..." delete_output=$(find "${{BACKUP_DIR}}" -maxdepth 1 -name 'docker_backup_*.tar.gz' -type f -printf '%T@ %p\\0' 2>/dev/null | \\ sort -zn | \\ head -zn "${{to_delete_count}}" | \\ cut -z -d' ' -f2- | \\ xargs -0 -r rm -v -- 2>&1) # Capture rm output (stdout+stderr) rm_exit_code=$? if [ $rm_exit_code -eq 0 ]; then log "Purge completed successfully." if [ -n "$delete_output" ]; then log "Deleted files:" # Log multi-line output safely echo "$delete_output" | while IFS= read -r line || [ -n "$line" ]; do log " $line"; done fi else log "WARNING: Purge command (rm) failed (exit code ${{rm_exit_code}}). Check output below." log "rm output:" echo "$delete_output" | while IFS= read -r line || [ -n "$line" ]; do log " $line"; done fi else log "${{backup_count}} backups found <= ${{KEEP_COUNT}}. No backups purged." fi log "Backup purge process finished." exit 0 """ # --- Health Check Script --- DOCKER_HEALTH_CHECK_SH_CONTENT = f"""#!/bin/sh # File: {DATA_MOUNT}/scripts/docker-health-check.sh # Checks Docker daemon health, responsiveness, and data-root configuration. set -e -u PERSISTENT_ROOT="{PERSISTENT_DOCKER_ROOT}" DOCKER_CONFIG_FILE="{DOCKER_CONFIG_FILE}" log() {{ echo "[docker-health-check] $1" | tee /dev/kmsg; }} # Log to kmsg and stdout/stderr log "Starting Docker health check..." # Check required commands check_cmds() {{ for cmd in "$@"; do if ! command -v "$cmd" >/dev/null 2>&1; then log "ERROR: Command '$cmd' not found."; exit 1; fi; done }} check_cmds systemctl docker # Check if Docker daemon service is running log "Checking if docker.service is active..." if ! systemctl is-active --quiet docker.service; then log "WARNING: Docker service is not running. Attempting restart..." if systemctl restart docker.service; then log "Docker service restarted successfully." sleep 5 # Give it time to fully start else log "ERROR: Failed to restart inactive Docker service!" exit 1 # Critical failure if it should be running but can't be started fi fi # Verify Docker daemon is responding to commands log "Checking Docker daemon responsiveness via 'docker info'..." if ! docker info > /dev/null 2>&1; then log "WARNING: Docker service is running but 'docker info' command failed. Attempting restart..." if systemctl restart docker.service; then log "Docker service restarted successfully." sleep 5 # Give it time # Re-check responsiveness after restart if ! docker info > /dev/null 2>&1; then log "ERROR: Docker daemon still not responding after restart! Requires manual investigation." exit 1 # Critical failure else log "Docker daemon is now responsive after restart." fi else log "ERROR: Failed to restart unresponsive Docker service!" exit 1 # Critical failure fi else log "Docker daemon is responsive." fi # Check if Docker is using the correct data-root directory log "Checking configured Docker data-root directory..." # Use docker info with Go template for precise extraction CURRENT_ROOT=$(docker info --format '{{{{.DockerRootDir}}}}' 2>/dev/null || echo "ERROR_GETTING_INFO") if [ "$CURRENT_ROOT" = "ERROR_GETTING_INFO" ]; then log "ERROR: Could not determine Docker's current data-root using 'docker info'. Health check incomplete." exit 1 # Exit as this is a significant issue elif [ "$CURRENT_ROOT" != "$PERSISTENT_ROOT" ]; then log "CRITICAL ERROR: Docker is using incorrect data-root!" log " Expected: $PERSISTENT_ROOT" log " Actual: $CURRENT_ROOT" log "This indicates a configuration problem in $DOCKER_CONFIG_FILE or Docker failed to apply it. Manual intervention required." exit 1 # Critical configuration error else log "Docker is correctly using the persistent data-root: $PERSISTENT_ROOT" fi log "Docker health check completed successfully." exit 0 """ # --- Volume Check Script --- # Define prune command based on configuration if VOLUME_CHECK_PRUNE_LEVEL == 0: PRUNE_COMMAND = "echo 'Automatic pruning disabled.'" # No-op elif VOLUME_CHECK_PRUNE_LEVEL == 1: # Prune stopped containers and dangling images only PRUNE_COMMAND = "docker container prune -f && docker image prune -f" elif VOLUME_CHECK_PRUNE_LEVEL >= 2: # Prune stopped containers and *all* unused images (more aggressive) PRUNE_COMMAND = "docker container prune -f && docker image prune -a -f" else: # Default to level 1 if invalid config PRUNE_COMMAND = "docker container prune -f && docker image prune -f" DOCKER_VOLUME_CHECK_SH_CONTENT = f"""#!/bin/sh # File: {DATA_MOUNT}/scripts/docker-volume-check.sh # Checks free space on the Docker persistent volume and optionally prunes resources. set -e -u DOCKER_ROOT="{PERSISTENT_DOCKER_ROOT}" MIN_FREE_PERCENT={VOLUME_CHECK_THRESHOLD_PERCENT} # Prune command determined by Python script configuration (Level: {VOLUME_CHECK_PRUNE_LEVEL}) PRUNE_CMD="{PRUNE_COMMAND}" log() {{ echo "[docker-volume-check] $1" | tee /dev/kmsg; }} log "Checking Docker volume free space: ${{DOCKER_ROOT}}" # Check required commands check_cmds() {{ for cmd in "$@"; do if ! command -v "$cmd" >/dev/null 2>&1; then log "ERROR: Command '$cmd' not found."; exit 1; fi; done }} check_cmds df awk sed docker # Need docker if pruning is enabled # Check if the Docker root directory exists if [ ! -d "$DOCKER_ROOT" ]; then log "ERROR: Docker root directory not found: $DOCKER_ROOT"; exit 1; fi # Get free space percentage using df -P for POSIX compatibility log "Calculating free space..." # Get Available and Total blocks (in 1K blocks usually) df_output=$(df -P "$DOCKER_ROOT" | awk 'NR==2 {{print $4, $2}}' 2>/dev/null) if [ -z "$df_output" ]; then log "ERROR: Failed to get disk usage using df for $DOCKER_ROOT"; exit 1; fi avail_kb=$(echo "$df_output" | awk '{{print $1}}') total_kb=$(echo "$df_output" | awk '{{print $2}}') # Handle edge case where total size is 0 or df failed weirdly if [ -z "$total_kb" ] || [ "$total_kb" -le 0 ]; then log "WARNING: Total disk size reported as zero or invalid for $DOCKER_ROOT. Cannot calculate percentage." exit 0 fi # Calculate free percentage using integer arithmetic free_percent=$(( (avail_kb * 100) / total_kb )) # Get human-readable sizes for logging total_size_hr=$(df -h "$DOCKER_ROOT" | awk 'NR==2 {{print $2}}') avail_size_hr=$(df -h "$DOCKER_ROOT" | awk 'NR==2 {{print $4}}') log "Volume Stats: Total=${{total_size_hr}}, Available=${{avail_size_hr}}, Free=${{free_percent}}%" # Check against threshold if [ "$free_percent" -lt "$MIN_FREE_PERCENT" ]; then log "WARNING: Low disk space! Free: ${{free_percent}}% (Threshold: ${{MIN_FREE_PERCENT}}%)" # Attempt to prune based on configured level if [ {VOLUME_CHECK_PRUNE_LEVEL} -gt 0 ]; then log "Attempting automatic prune (Level: {VOLUME_CHECK_PRUNE_LEVEL})..." prune_output=$({PRUNE_COMMAND} 2>&1) || prune_exit_code=$? # Check exit code, prune can return non-zero even if it works partially if [ "${{prune_exit_code:-0}}" -eq 0 ]; then log "Docker prune command executed successfully." else log "WARNING: Docker prune command finished with exit code ${{prune_exit_code}}." fi log "Prune output:" echo "$prune_output" | while IFS= read -r line || [ -n "$line" ]; do log " $line"; done # Recalculate free space after pruning log "Recalculating space after cleanup..." df_output=$(df -P "$DOCKER_ROOT" | awk 'NR==2 {{print $4, $2}}' 2>/dev/null) avail_kb=$(echo "$df_output" | awk '{{print $1}}') total_kb=$(echo "$df_output" | awk '{{print $2}}') if [ "$total_kb" -gt 0 ]; then free_percent=$(( (avail_kb * 100) / total_kb )); else free_percent=0; fi avail_size_hr=$(df -h "$DOCKER_ROOT" | awk 'NR==2 {{print $4}}') log "Space after cleanup: Available=${{avail_size_hr}}, Free=${{free_percent}}%" if [ "$free_percent" -lt "$MIN_FREE_PERCENT" ]; then log "ERROR: Space still critically low after cleanup! Manual intervention likely required." else log "Space is now above threshold after cleanup." fi else log "Automatic pruning is disabled (Level 0). Manual cleanup needed." fi else log "Sufficient free space available (${{free_percent}}%)." fi log "Docker volume check completed." exit 0 """ # --- Systemd Units (Content definitions remain the same as previous version) --- # Backup Service DOCKER_BACKUP_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-backup.service [Unit] Description=Backup Docker Data ({PERSISTENT_DOCKER_ROOT}) Documentation=file://{DATA_MOUNT}/scripts/backup-docker.sh Requires=mnt-sbnb-data.mount After=mnt-sbnb-data.mount docker.service # Ensure mount and docker are up [Service] Type=oneshot ExecStart=/bin/sh {DATA_MOUNT}/scripts/backup-docker.sh """ # Backup Timer DOCKER_BACKUP_TIMER_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-backup.timer [Unit] Description=Daily Docker Backup Timer ({PERSISTENT_DOCKER_ROOT}) Requires=docker-backup.service [Timer] OnCalendar=*-*-* 05:00:00 AccuracySec=1h Persistent=true RandomizedDelaySec=600 # 10 minutes Unit=docker-backup.service [Install] WantedBy=timers.target """ # Purge Service DOCKER_PURGE_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-purge.service [Unit] Description=Purge Old Docker Backups ({BACKUP_BASE_DIR}) Documentation=file://{DATA_MOUNT}/scripts/purge-docker-backups.sh Requires=mnt-sbnb-data.mount After=mnt-sbnb-data.mount [Service] Type=oneshot ExecStart=/bin/sh {DATA_MOUNT}/scripts/purge-docker-backups.sh """ # Purge Timer DOCKER_PURGE_TIMER_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-purge.timer [Unit] Description=Daily Docker Backup Purge Timer Requires=docker-purge.service [Timer] OnCalendar=*-*-* 06:00:00 AccuracySec=1h Persistent=true RandomizedDelaySec=300 # 5 minutes Unit=docker-purge.service [Install] WantedBy=timers.target """ # Shutdown Backup Service DOCKER_SHUTDOWN_BACKUP_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-shutdown-backup.service [Unit] Description=Backup Docker Data ({PERSISTENT_DOCKER_ROOT}) on Shutdown (Best Effort) Documentation=file://{DATA_MOUNT}/scripts/backup-docker.sh DefaultDependencies=no # Crucial for shutdown units Requires=mnt-sbnb-data.mount docker.service After=mnt-sbnb-data.mount docker.service network.target Before=shutdown.target reboot.target halt.target kexec.target umount.target final.target [Service] Type=oneshot RemainAfterExit=true # Important for ExecStop= during shutdown TimeoutStopSec=180 # Give backup reasonable time (3 minutes) ExecStop=/bin/sh {DATA_MOUNT}/scripts/backup-docker.sh # Run backup on stop [Install] WantedBy=shutdown.target reboot.target halt.target kexec.target """ # Health Check Service DOCKER_HEALTH_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-health-check.service [Unit] Description=Docker Health Check Service Documentation=file://{DATA_MOUNT}/scripts/docker-health-check.sh Requires=mnt-sbnb-data.mount docker.service After=mnt-sbnb-data.mount docker.service [Service] Type=oneshot ExecStart=/bin/sh {DATA_MOUNT}/scripts/docker-health-check.sh # Optional resource limits # CPUQuota=10% # MemoryMax=128M """ # Health Check Timer DOCKER_HEALTH_TIMER_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-health-check.timer [Unit] Description=Regular Docker Health Check Timer Requires=docker-health-check.service [Timer] # Run 5 mins after boot, then every 15 mins OnBootSec=5min OnUnitActiveSec=15min AccuracySec=1min Unit=docker-health-check.service [Install] WantedBy=timers.target """ # Volume Check Service DOCKER_VOLUME_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-volume-check.service [Unit] Description=Docker Volume Space Check Service ({PERSISTENT_DOCKER_ROOT}) Documentation=file://{DATA_MOUNT}/scripts/docker-volume-check.sh Requires=mnt-sbnb-data.mount docker.service After=mnt-sbnb-data.mount docker.service [Service] Type=oneshot ExecStart=/bin/sh {DATA_MOUNT}/scripts/docker-volume-check.sh # Optional resource limits # CPUQuota=10% # MemoryMax=64M """ # Volume Check Timer DOCKER_VOLUME_TIMER_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-volume-check.timer [Unit] Description=Regular Docker Volume Check Timer Requires=docker-volume-check.service [Timer] # Run 10 mins after boot, then every hour OnBootSec=10min OnUnitActiveSec=1h AccuracySec=5min Unit=docker-volume-check.service [Install] WantedBy=timers.target """ # --- Dictionary of Files to Create --- # Defines all files to be generated by this script FILES_TO_CREATE = { # --- ESP Files --- f"{ESP_MOUNT}/sbnb-cmds.sh": { "content": SBNB_CMDS_SH_CONTENT, "permissions": 0o755 # rwxr-xr-x }, f"{ESP_MOUNT}/sbnb-tskey.txt": { "content": SBNB_TSKEY_TXT_CONTENT, "permissions": 0o600 # rw------- (Restrict access to key) }, # --- Data Partition Files --- # Helper Scripts f"{DATA_MOUNT}/scripts/backup-docker.sh": { "content": BACKUP_DOCKER_SH_CONTENT, "permissions": 0o750 # rwxr-x--- (Owner exec, group read/exec) }, f"{DATA_MOUNT}/scripts/purge-docker-backups.sh": { "content": PURGE_DOCKER_BACKUPS_SH_CONTENT, "permissions": 0o750 }, f"{DATA_MOUNT}/scripts/docker-health-check.sh": { "content": DOCKER_HEALTH_CHECK_SH_CONTENT, "permissions": 0o750 }, f"{DATA_MOUNT}/scripts/docker-volume-check.sh": { "content": DOCKER_VOLUME_CHECK_SH_CONTENT, "permissions": 0o750 }, # Systemd Units f"{DATA_MOUNT}/systemd/docker-backup.service": { "content": DOCKER_BACKUP_SERVICE_CONTENT, "permissions": 0o644 # rw-r--r-- (Standard systemd unit permissions) }, f"{DATA_MOUNT}/systemd/docker-backup.timer": { "content": DOCKER_BACKUP_TIMER_CONTENT, "permissions": 0o644 }, f"{DATA_MOUNT}/systemd/docker-purge.service": { "content": DOCKER_PURGE_SERVICE_CONTENT, "permissions": 0o644 }, f"{DATA_MOUNT}/systemd/docker-purge.timer": { "content": DOCKER_PURGE_TIMER_CONTENT, "permissions": 0o644 }, f"{DATA_MOUNT}/systemd/docker-shutdown-backup.service": { "content": DOCKER_SHUTDOWN_BACKUP_SERVICE_CONTENT, "permissions": 0o644 }, f"{DATA_MOUNT}/systemd/docker-health-check.service": { "content": DOCKER_HEALTH_SERVICE_CONTENT, "permissions": 0o644 }, f"{DATA_MOUNT}/systemd/docker-health-check.timer": { "content": DOCKER_HEALTH_TIMER_CONTENT, "permissions": 0o644 }, f"{DATA_MOUNT}/systemd/docker-volume-check.service": { "content": DOCKER_VOLUME_SERVICE_CONTENT, "permissions": 0o644 }, f"{DATA_MOUNT}/systemd/docker-volume-check.timer": { "content": DOCKER_VOLUME_TIMER_CONTENT, "permissions": 0o644 }, } # --- Global counters for create_files status --- warning_count = 0 fail_count = 0 # --- Main Script Logic --- def check_prerequisites(): """Verify script prerequisites before attempting file creation.""" print("--- Checking Prerequisites ---") passed = True # 1. Check root privileges if os.geteuid() != 0: print("ERROR: Script must be run as root (UID 0).") passed = False else: print("OK: Running as root.") # 2. Check base mount points exist and are writable base_dirs = {ESP_MOUNT: "ESP", DATA_MOUNT: "Data"} for bdir, name in base_dirs.items(): bdir_path = pathlib.Path(bdir) print(f"Checking {name} mount point: {bdir}...") if not bdir_path.is_dir(): print(f"ERROR: Base {name} directory '{bdir}' does not exist or is not a directory.") print(f" Please ensure the corresponding partition is mounted correctly before running.") passed = False elif not os.access(bdir_path, os.W_OK): print(f"ERROR: Base {name} directory '{bdir}' is not writable by the current user (root). Check mount options or permissions.") passed = False else: print(f"OK: Base {name} directory '{bdir}' exists and is writable.") # 3. Check for optional but recommended commands needed by generated scripts print("Checking for optional command (jq)...") try: if shutil.which("jq"): print("OK: 'jq' command found (recommended for robust daemon.json handling).") else: print("WARNING: 'jq' command not found. Generated sbnb-cmds.sh will use less robust methods for daemon.json, which might fail or overwrite existing settings.") # Removed rsync check as it's no longer used/preferred by the generated script except ImportError: print("WARNING: Python 'shutil' module not found, cannot check for optional command (jq).") except Exception as e: print(f"WARNING: Error checking for optional commands: {e}") if not passed: print("----------------------------") print("ERROR: Prerequisites not met. Aborting script.") sys.exit(1) print("--- Prerequisites OK ---") return True def create_files(): """Creates directories and files as defined in FILES_TO_CREATE.""" global warning_count, fail_count # Declare intent to modify globals print("\n--- Starting File Creation Process ---") success_count = 0 warning_count = 0 # Reset global counter fail_count = 0 # Reset global counter # Ensure the base backup directory exists first with correct permissions try: print(f"\nEnsuring base backup directory exists: {BACKUP_BASE_DIR}") # Create directory with specific permissions (rwxr-x---) os.makedirs(BACKUP_BASE_DIR, mode=BACKUP_DIR_PERMISSIONS, exist_ok=True) # Explicitly set permissions in case it already existed with different ones current_perm = stat.S_IMODE(os.stat(BACKUP_BASE_DIR).st_mode) if current_perm != BACKUP_DIR_PERMISSIONS: print(f" Adjusting permissions on {BACKUP_BASE_DIR} to {BACKUP_DIR_PERMISSIONS:o}...") # Use :o format os.chmod(BACKUP_BASE_DIR, BACKUP_DIR_PERMISSIONS) print(f"OK: Backup directory ensured: {BACKUP_BASE_DIR} with permissions {BACKUP_DIR_PERMISSIONS:o}") # Use :o format except OSError as e: print(f"ERROR: Failed to create or set permissions on {BACKUP_BASE_DIR}: {e}") sys.exit(f"ERROR: Could not ensure backup directory '{BACKUP_BASE_DIR}'. Exiting.") except Exception as e: print(f"ERROR: An unexpected error occurred ensuring backup directory: {e}") sys.exit(f"ERROR: Could not ensure backup directory '{BACKUP_BASE_DIR}'. Exiting.") # Process the files dictionary for file_path_str, details in FILES_TO_CREATE.items(): file_path = pathlib.Path(file_path_str) write_succeeded = False # Flag to track if write was successful try: content = details.get("content") # Use get() as content might be None for dirs permissions = details.get("permissions") # Use .get() for optional permissions # Assign default permissions if not specified if permissions is None: if content is None: # It's meant to be a directory permissions = 0o755 # Default rwxr-xr-x for directories else: # It's a file permissions = 0o644 # Default rw-r--r-- for files print(f"INFO: No specific permission set for {file_path}, using default {permissions:o}.") # Use :o format except KeyError as e: print(f"\nERROR: Configuration error - Missing '{e}' key for entry {file_path_str}. Skipping.") fail_count += 1 continue except Exception as e: print(f"\nERROR: Configuration error for {file_path_str}: {e}. Skipping.") fail_count += 1 continue print(f"\nProcessing: {file_path}") # 1. Create parent directories robustly try: parent_dir = file_path.parent # Check if parent needs creation (avoid os.makedirs on existing dirs if possible) if not parent_dir.is_dir(): print(f" Creating parent directory: {parent_dir}") # mode=0o755 sets default permissions for newly created dirs (rwxr-xr-x) os.makedirs(parent_dir, mode=0o755, exist_ok=True) # Explicitly set permissions on parent in case it was just created or exist_ok=True skipped it print(f" Setting parent directory permissions to 755...") # 755 doesn't need 0o prefix os.chmod(parent_dir, 0o755) else: # Parent exists, ensure it's writable and has correct permissions print(f" Parent directory exists: {parent_dir}") if not os.access(parent_dir, os.W_OK): print(f" WARNING: Parent directory {parent_dir} is not writable! File write may fail.") warning_count += 1 # Ensure existing parent has standard 755 permissions try: current_parent_perm = stat.S_IMODE(os.stat(parent_dir).st_mode) if current_parent_perm != 0o755: print(f" Ensuring parent directory permissions are 755 (currently {current_parent_perm:o})...") # Use :o format os.chmod(parent_dir, 0o755) except OSError as e: print(f" WARNING: Could not check/set permissions on existing parent {parent_dir}: {e}") warning_count += 1 except OSError as e: print(f" ERROR: Failed to create or set permissions on parent directory {parent_dir}: {e}") print(f" Skipping item: {file_path}") fail_count += 1 continue # Skip to the next file except Exception as e: print(f" ERROR: An unexpected error occurred creating parent directory for {file_path}: {e}") print(f" Skipping item: {file_path}") fail_count += 1 continue # 2. Write the file content (or create directory if content is None) if content is not None: # It's a file try: print(f" Writing content...") # Use write_text for atomic write where possible and UTF-8 encoding file_path.write_text(content, encoding='utf-8') print(f" Successfully wrote: {file_path}") write_succeeded = True except IOError as e: print(f" ERROR: Failed to write file {file_path}: {e}") fail_count += 1 continue # Skip permissions if write failed except Exception as e: print(f" ERROR: An unexpected error occurred writing {file_path}: {e}") fail_count += 1 continue else: # It's a directory (content is None) try: print(f" Ensuring directory exists: {file_path}") os.makedirs(file_path, mode=permissions, exist_ok=True) # Explicitly set permissions in case it already existed os.chmod(file_path, permissions) print(f" Successfully ensured directory: {file_path}") write_succeeded = True # Treat dir success like file write success except OSError as e: print(f" ERROR: Failed to create/set permissions on directory {file_path}: {e}") fail_count += 1 continue except Exception as e: print(f" ERROR: An unexpected error occurred ensuring directory {file_path}: {e}") fail_count += 1 continue # 3. Set permissions (only if write/dir creation succeeded) if write_succeeded: try: # Check if current permissions match target permissions before attempting chmod current_perm = stat.S_IMODE(os.stat(file_path).st_mode) if current_perm != permissions: print(f" Setting permissions to {permissions:o} (currently {current_perm:o})...") # Use :o format os.chmod(file_path, permissions) print(f" Successfully set permissions for: {file_path}") else: print(f" Permissions already set correctly ({permissions:o}) for: {file_path}") # Use :o format success_count += 1 # Count full success (write/dir + chmod) except OSError as e: print(f" WARNING: Failed to set permissions on {file_path}: {e}") warning_count += 1 # Item created/written, but permissions failed/check failed except Exception as e: print(f" WARNING: An unexpected error occurred setting permissions for {file_path}: {e}") warning_count += 1 # --- Summary --- print("\n--- File Creation Summary ---") print(f"Successfully processed (created/permissioned): {success_count} items") print(f"Items processed but with warnings: {warning_count}") print(f"Failed operations (write/dir/parent): {fail_count}") print("-------------------------------\n") total_issues = fail_count + warning_count if total_issues > 0: print("NOTE: Some errors or warnings occurred during file creation.") if fail_count > 0: print("ERROR: Fatal errors occurred. Deployment incomplete.") return False # Fatal errors occurred else: print("Deployment completed, but with warnings. Please review the output above.") return True # Only non-fatal warnings else: print("SBNB configuration file deployment completed successfully.") return True # --- Script Execution --- if __name__ == "__main__": print("=====================================================================") print(" SBNB Unified Configuration Deployment Script (v2.1 - BusyBox cp) ") print("=====================================================================") print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}") print(f"Configuring Docker persistent root: {PERSISTENT_DOCKER_ROOT}") print("Includes Backup/Purge and Health/Volume monitoring.") print("Data migration uses 'cp -a -u' (BusyBox friendly).") print("=====================================================================\n") # Store counts for final status reporting final_warning_count = 0 final_fail_count = 0 if check_prerequisites(): # Capture status from create_files create_files_success = create_files() # Access the global counters updated by create_files final_warning_count = warning_count final_fail_count = fail_count if create_files_success or (final_fail_count == 0 and final_warning_count > 0) : # Success or only warnings - print final instructions print("\n!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!") print("!!! CRITICAL: You MUST replace the placeholder in !!!") print(f"!!! '{ESP_MOUNT}/sbnb-tskey.txt' with your actual Tailscale auth key! !!!") print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!") print("\n--- Next Steps ---") print("1. Review any WARNINGS in the output above.") print("2. Reboot the system for sbnb-cmds.sh to take effect.") print("3. After reboot, verify Docker configuration and status:") print(f" - Check data root: `docker info | grep 'Docker Root Dir'` (should show '{PERSISTENT_DOCKER_ROOT}')") print(f" - Check status: `systemctl status docker.service`") print(f" - Check boot script logs: `journalctl -t sbnb-cmds.sh --no-pager` or check `/dev/kmsg` output during boot") print(f" - Check timers: `systemctl list-timers --all | grep docker`") print(f" - Check helper script logs periodically: `journalctl -t backup-docker.sh -t purge-docker-backups.sh -t docker-health-check -t docker-volume-check --no-pager`") if final_warning_count > 0: print("\nDeployment finished with WARNINGS.") sys.exit(2) # Exit code 2 for success with warnings else: print("\nDeployment finished successfully.") sys.exit(0) # Exit successfully else: # Fatal errors occurred during file creation print("\n--- Deployment Failed ---") print("Fatal errors occurred during file creation. System configuration may be incomplete or inconsistent.") sys.exit(1) # Exit with error code
  1. Unmount the EFI Partition:
    echo "--- Unmounting ESP partition ---" # Ensure buffers are flushed before unmounting sync sudo umount /mnt/sbnb-mount

#Phase 4: Backing Up Data (CRITICAL!)

  • Why Essential: High risk of USB drive failure. Backups are mandatory.
  • Strategy: Automate regular backups of /mnt/sbnb-data.
  • File Data Backup (rsync): Ensure the backup destination (NAS, cloud, another server) has sufficient free space.
    # Example: From Sbnb to backup-server (requires ssh key auth) rsync -avz --delete --progress --human-readable /mnt/sbnb-data/ user@backup-server:/path/to/backups/sbnb-usb-data/
  • Frequency: Daily recommended for active data.
  • Automation: Use cron/systemd timers or remote triggers.
  • Testing Restores: Vital! Don’t assume backups work.
  • Conceptual Restore: Boot Linux Live env -> Mount backup source -> Mount target USB data partition (new/reformatted) to /mnt/restore -> sudo rsync -av --progress /path/to/backup/sbnb-usb-data/ /mnt/restore/ -> Verify restored files (count, size, checksums, spot checks).
  • Verification: Use tools like diff -r, md5sum, or sha256sum to compare restored files against originals or known good copies.
  • Untested backups provide a false sense of security.

#Phase 5: Boot and Verify

  1. Safely Eject: Eject USB from prep system.
  2. Configure Server BIOS/UEFI: Enter setup (DEL, F2, F10, F12, etc.). Ensure UEFI Mode ON, CSM/Legacy OFF, Secure Boot OFF. Set “UEFI: USB…” as first boot device. Save & Exit.
  3. Boot Sbnb Linux.
  4. Verify Operation:
    • Monitor Boot: Watch console for sbnb-cmds.sh logs, errors.
    • SSH into Sbnb.
    • Check Mounts:
      lsblk -o NAME,SIZE,TYPE,FSTYPE,LABEL,MOUNTPOINT # Look for mount at /mnt/sbnb-data df -hT | grep -E 'Filesystem|/mnt/sbnb-data' # Check usage/type mount | grep /mnt/sbnb-data # Check mount options (rw, noatime) findmnt /mnt/sbnb-data # Another way to check mount info
    • Test Persistence: ```bash

      #After SSHing in:

      TIMESTAMP=$(date) echo “Sbnb USB Persistence test - $TIMESTAMP” | sudo tee /mnt/sbnb-data/persistence_test.txt > /dev/null sync && echo “Synced data to disk.” echo “File created. Content:” && sudo cat /mnt/sbnb-data/persistence_test.txt echo “Rebooting server now…” && sudo reboot

    #— Wait for reboot and reconnect via SSH —

    echo “Checking for file after reboot…” if [ -f /mnt/sbnb-data/persistence_test.txt ]; then echo “SUCCESS: File found. Content:” && sudo cat /mnt/sbnb-data/persistence_test.txt sudo rm /mnt/sbnb-data/persistence_test.txt # Clean up else echo “FAILURE: File NOT FOUND after reboot! Persistence failed.” fi ```

#Troubleshooting

  • Doesn’t Boot / No Bootable Device:
    • Re-verify BIOS settings (UEFI, Secure Boot OFF, Boot Order).
    • Re-verify USB Prep: Partitions (parted print), ESP flags (boot,esp), ESP filesystem label (blkid /dev/sdX1 -> LABEL="sbnb"), EFI file path (/EFI/BOOT/BOOTX64.EFI).
    • Try different USB ports (check if port provides sufficient power). Test drive health on prep machine (fsck, badblocks -nvs /dev/sdX). Recreate drive meticulously.
  • Data Partition Not Mounted / /mnt/sbnb-data Empty:
    • Check boot logs (journalctl -b, console) for sbnb-cmds.sh errors (“Device… not found”, “Failed to mount”). Check dmesg for USB errors (dmesg | grep -iE 'usb|sdX') or filesystem errors (dmesg | grep -i ext4).
    • SSH in:
      • Verify partition & label: sudo blkid, ls -l /dev/disk/by-label/. Is SBNB_DATA present? Does it point to the correct device?
      • If label wrong/missing: Re-label from prep env (sudo e2label /dev/sdX2 SBNB_DATA).
      • If device/label exists, try manual mount: sudo mkdir -p /mnt/sbnb-data && sudo mount /dev/disk/by-label/SBNB_DATA /mnt/sbnb-data. Check dmesg for errors (e.g., mount: wrong fs type, bad option, bad superblock). If manual mount works, debug sbnb-cmds.sh (add set -x, check paths, loop duration, check script permissions ls -l /mnt/sbnb/sbnb-cmds.sh).
      • Run filesystem check (unmounted): sudo e2fsck -f /dev/disk/by-label/SBNB_DATA.
      • Check kernel modules: lsmod | grep ext4. Is the module loaded? Check dmesg for errors loading filesystem modules.
  • Poor Performance / Drive Failure:
    • Performance: Inherent limitation.
    • Lifespan/Failure: Monitor dmesg for I/O errors. Restore from verified backups upon failure. This setup will wear out consumer flash drives with persistent writes.
URL: https://ib.bsb.br/sbnb
Ref. https://github.com/sbnb-io/sbnb