This guide provides comprehensive, step-by-step instructions for configuring a single USB flash drive (or potentially an external USB hard drive) to perform two distinct functions simultaneously:
- Booting the Sbnb Linux Operating System: The drive will be prepared with a standard UEFI-compatible structure, specifically an EFI System Partition (ESP) containing the Sbnb EFI bootloader (
sbnb.efi
) and necessary configuration files. This allows the server’s firmware to locate and start the Sbnb boot process. Thesbnb.efi
file itself is typically a Unified Kernel Image (UKI), bundling the Linux kernel, initramfs, and kernel command line into a single executable file. - Providing Simple Persistent Storage: Utilizing a separate partition on the same physical USB drive, formatted with a standard Linux filesystem (
ext4
is used in this guide). This partition is intended to be automatically mounted at the/mnt/sbnb-data
directory path within the running Sbnb Linux system via a custom boot script (sbnb-cmds.sh
). This provides a space where data (like container volumes, application data, logs, user files) can persist across reboots of the otherwise ephemeral, RAM-based Sbnb OS.
Why ext4
instead of LVM: Initial analysis suggested LVM might be suitable, but further review of the default Sbnb Linux build configuration indicates the necessary lvm2
user-space tools are likely missing from the base runtime environment. Without these tools, managing LVM volumes during boot via standard scripts is infeasible unless you create a custom Sbnb build that includes the lvm2
package. This revised guide therefore uses a standard ext4
filesystem partition, relying only on basic tools expected to be present in Sbnb.
Contrasting with Standard Sbnb Workflow: It’s crucial to understand that this guide describes a highly non-standard setup. The intended Sbnb workflow prioritizes resilience, performance, and statelessness:
- Boot the minimal Sbnb OS from simple USB/network.
- Use automation (Ansible) or manual scripts (
sbnb-configure-storage.sh
) post-boot to configure LVM on internal server drives. - Run workloads utilizing this fast, reliable internal storage. This guide’s method compromises these benefits for single-drive convenience under specific constraints.
#***** EXTREME CAUTION: IRREVERSIBLE DATA DESTRUCTION IMMINENT! *****
This procedure involves low-level disk operations (partitioning, formatting) that will completely and PERMANENTLY ERASE ALL DATA currently residing on the USB drive you select. There is NO UNDO function. Data recovery after accidental formatting is often impossible.
The most critical risk is selecting the wrong target device. Mistakenly choosing your computer’s internal hard drive (e.g.,
/dev/sda
,/dev/nvme0n1
) instead of the intended USB drive (e.g.,/dev/sdb
,/dev/sdc
) WILL RESULT IN CATASTROPHIC AND LIKELY IRRECOVERABLE LOSS OF YOUR OPERATING SYSTEM, APPLICATIONS, AND PERSONAL FILES.You MUST verify the target device name multiple times using different commands (like
lsblk
,fdisk
,parted
) and cross-reference with expected drive sizes and models before executing any partitioning or formatting commands. Proceed with extreme vigilance, double-checking each step, entirely at your own sole risk!
#Primary Drawbacks & Warnings (Reiterated & Expanded):
- Highly Non-Standard & Complex: Deviates significantly from Sbnb’s design. Setup is intricate, runtime behavior depends on precise script execution and timing. Future Sbnb updates might break this.
- Severe Performance Penalty: USB storage is inherently slow (latency, throughput, IOPS) compared to internal NVMe/SATA drives. Disk I/O to
/mnt/sbnb-data
will be a major bottleneck. - Drastically Reduced Lifespan & Reliability: USB flash drives will wear out quickly under persistent write load due to limited write cycles, write amplification, and lack of TRIM support. Unsuitable for write-intensive workloads or high reliability needs. Expect eventual failure and data loss without robust backups.
- Potential Instability & Boot Issues: Relies on correct partition detection, udev node creation, filesystem integrity, and
sbnb-cmds.sh
execution timing. Failures can leave persistent storage unavailable.
#When Might This Be Considered? (Limited Scenarios with Full Risk Acceptance)
- Temporary Testing/Experimentation ONLY: Brief evaluations on hardware lacking internal drives.
- Specific, Very Low-Intensity, Read-Mostly Use Cases: Infrequent writes, performance irrelevant (e.g., static config kiosk).
- Absolute Hardware Constraints: Sealed systems where internal drives are impossible, and risks are fully accepted.
Even in these limited scenarios, regular, automated, and verified backups are non-negotiable.
#Prerequisites
- A Suitable USB Flash Drive:
- Capacity: Min ~1GB ESP + desired data size (32GB+ recommended).
- Quality & Speed: Reputable brand, USB 3.0+ advised for marginal speed benefit. Endurance matters more than peak speed.
- A Working Linux System (Preparation Environment):
- Necessity: Required for partitioning/formatting the target USB safely. openSUSE Tumbleweed assumed.
- Live Environment Benefit: Using a Live USB/CD (e.g., openSUSE Tumbleweed Live) is highly recommended as it provides a non-destructive environment.
- Sbnb Linux Boot File (
sbnb.efi
):- Method 1 (Easier): Run official Sbnb install script on a temporary USB, then copy
/EFI/BOOT/BOOTX64.EFI
from its ESP. - Method 2 (Advanced): Build Sbnb from source, find
sbnb.efi
inoutput/images/
.
- Method 1 (Easier): Run official Sbnb install script on a temporary USB, then copy
- Root/Sudo Privileges: Needed on the openSUSE prep system for disk commands.
- Internet Connection: May be needed for
zypper
.
#Step-by-Step Instructions
(Reminder: TRIPLE-CHECK your target device name, e.g., /dev/sdX
, before every destructive command!)
#Phase 1: Prepare the Linux Environment (openSUSE Tumbleweed)
- Boot into openSUSE: Start your preparation environment.
- Install Necessary Tools: Open a terminal.
zypper refresh
updates package lists.zypper install
installs tools.sudo zypper refresh sudo zypper install -y parted lvm2 dosfstools e2fsprogs
- Identify Target USB Drive: CRITICAL SAFETY STEP! Unplug other USB storage.
- Insert the target USB drive.
- Use multiple commands. Compare SIZE and MODEL. Check
dmesg | tail
after plugging in for kernel messages likesd 2:0:0:0: [sdc] Attached SCSI removable disk
.lsblk -d -o NAME,SIZE,MODEL,VENDOR,TYPE | grep 'disk' sudo fdisk -l | grep '^Disk /dev/' sudo parted -l | grep '^Disk /dev/' # Example: If consistently identified as /dev/sdc, use /dev/sdc below.
- Visually confirm with YaST Partitioner (
sudo yast2 partitioner
) or GParted (sudo zypper install -y gparted && sudo gparted
) if preferred. Look for the drive matching the expected size and vendor/model. - Assume
/dev/sdX
is your verified target drive. Replace it carefully!
#Phase 2: Partition the USB Drive
(Warning: The following parted
commands are DESTRUCTIVE to /dev/sdX
. Double-check the device name!)
This script automates the partitioning and formatting process. Save it as prepare_usb.sh
, make it executable (chmod +x prepare_usb.sh
), and run it with sudo ./prepare_usb.sh /dev/sdX
(replacing /dev/sdX
with your verified target device).
#!/bin/bash
# --- Configuration ---
# Exit immediately if a command exits with a non-zero status.
# Treat unset variables as an error when substituting.
# Pipelines return the exit status of the last command to exit non-zero.
set -euo pipefail
# --- Variables ---
# EFI System Partition (ESP) Label (CRITICAL - must match bootloader config)
ESP_LABEL="sbnb"
# Data Partition Label (Recommended for identification)
DATA_LABEL="SBNB_DATA"
# ESP Size (Adjust if needed, ~1GB is usually sufficient)
ESP_SIZE="1025MiB"
# List of required commands for the script to function
REQUIRED_CMDS=(
"parted" "mkfs.vfat" "mkfs.ext4" "wipefs" "findmnt" "lsblk"
"blkid" "fsck.vfat" "e2fsck" "sync" "id" "grep" "read"
"sleep" "xargs" "umount" "partprobe" "realpath"
)
# --- Functions ---
# Function to check for required commands
check_dependencies() {
echo "--- Checking for required commands ---"
local missing_cmds=()
for cmd in "${REQUIRED_CMDS[@]}"; do
if ! command -v "$cmd" &> /dev/null; then
missing_cmds+=("$cmd")
fi
done
if [ ${#missing_cmds[@]} -ne 0 ]; then
echo "ERROR: The following required commands are not found:" >&2
printf " - %s\n" "${missing_cmds[@]}" >&2
echo "Please install them and try again." >&2
exit 1
fi
echo "All required commands found."
}
# Function to get the base block device for a given path (handles partitions, links, etc.)
get_base_device() {
local path="$1"
local resolved_path
resolved_path=$(realpath "$path") || { echo "ERROR: Cannot resolve path '$path'" >&2; return 1; }
# lsblk -no pkname gets the parent kernel name (base device)
lsblk -no pkname "$resolved_path" || { echo "ERROR: Cannot find base device for '$resolved_path' using lsblk." >&2; return 1; }
}
# --- Script Start ---
echo "-----------------------------------------------------"
echo "--- USB Drive Partitioning and Formatting Script ---"
echo "--- (Version 2 - Enhanced Safety) ---"
echo "-----------------------------------------------------"
echo ""
echo "WARNING: This script is DESTRUCTIVE and will ERASE"
echo " ALL DATA on the target device."
echo ""
# --- Check for Root Privileges ---
if [ "$(id -u)" -ne 0 ]; then
echo "ERROR: This script must be run as root (e.g., using sudo)." >&2
exit 1
fi
# --- Check Dependencies ---
check_dependencies
# --- Check for Device Argument ---
if [ -z "${1:-}" ]; then
echo "Usage: $0 /dev/sdX"
echo "ERROR: Please provide the target block device (e.g., /dev/sda, /dev/sdb)." >&2
echo ""
echo "Available block devices (excluding ROM, loop, and RAM devices):"
lsblk -d -o NAME,SIZE,TYPE,MODEL | grep -vE 'rom|loop|ram'
exit 1
fi
DEVICE="$1"
# --- Validate Device ---
if [ ! -b "$DEVICE" ]; then
echo "ERROR: '$DEVICE' is not a valid block device." >&2
exit 1
fi
# --- CRITICAL SAFETY CHECK: Prevent targeting the root filesystem device ---
echo "--- Performing safety checks ---"
ROOT_DEV_PATH=$(findmnt -n -o SOURCE /)
ROOT_BASE_DEV_NAME=$(get_base_device "$ROOT_DEV_PATH") || exit 1 # Exit if function fails
TARGET_BASE_DEV_NAME=$(get_base_device "$DEVICE") || exit 1
# Construct full device paths for comparison
ROOT_BASE_DEV="/dev/${ROOT_BASE_DEV_NAME}"
TARGET_BASE_DEV="/dev/${TARGET_BASE_DEV_NAME}" # Assumes the input $DEVICE is the base device
if [ "$TARGET_BASE_DEV" == "$ROOT_BASE_DEV" ]; then
echo "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" >&2
echo "FATAL ERROR: Target device '$DEVICE' appears to be the same" >&2
echo " device ('$ROOT_BASE_DEV') as the running root" >&2
echo " filesystem ('$ROOT_DEV_PATH')." >&2
echo " Aborting to prevent data loss." >&2
echo "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" >&2
exit 1
fi
echo "Safety check passed: Target device '$DEVICE' is not the root filesystem device ('$ROOT_BASE_DEV')."
# Check if the device looks like an SD card reader often used for the OS drive
if [[ "$DEVICE" == /dev/mmcblk* ]]; then
echo "WARNING: '$DEVICE' looks like an SD card (e.g., /dev/mmcblk0)."
echo " Double-check this is not your primary OS drive!"
fi
# --- Confirmation ---
echo ""
echo "Target Device: $DEVICE"
echo "Partitions to be created:"
echo " 1: EFI System Partition (ESP), FAT32, Label: '$ESP_LABEL', Size: $ESP_SIZE, Flags: boot, esp"
echo " 2: Linux Data Partition, ext4, Label: '$DATA_LABEL', Size: Remaining space"
echo ""
read -p "ARE YOU ABSOLUTELY SURE you want to erase '$DEVICE' and proceed? (yes/NO): " CONFIRMATION
CONFIRMATION=${CONFIRMATION:-NO} # Default to NO if user just presses Enter
if [[ "$CONFIRMATION" != "yes" ]]; then
echo "Operation cancelled by user."
exit 0
fi
echo ""
echo "--- Proceeding with operations on $DEVICE ---"
# --- Phase 2: Partition the USB Drive ---
# 1. Unmount Existing Partitions
echo ""
echo "--- Unmounting any existing partitions on ${DEVICE}* ---"
# Use findmnt to get mount points and umount them safely
# Also try to unmount the base device itself in case it's loop-mounted etc.
findmnt -n -o TARGET --source "${DEVICE}*" | xargs --no-run-if-empty umount -v -l || echo "Info: No partitions were mounted or umount failed (might be okay)."
umount "$DEVICE" &>/dev/null || true # Attempt to unmount base device, ignore errors
sleep 1 # Give time for umount to settle
lsblk "$DEVICE"
# 2. Wipe Existing Signatures (Recommended)
echo ""
echo "--- Wiping filesystem/partition signatures from $DEVICE ---"
wipefs --all --force "$DEVICE"
sync # Flush kernel buffers to disk to ensure changes are physically written
# 3. Create New GPT Partition Table
echo ""
echo "--- Creating new GPT partition table on $DEVICE ---"
parted "$DEVICE" --script -- mklabel gpt
sync # Flush kernel buffers to disk
# 4. Create EFI System Partition (ESP)
echo ""
echo "--- Creating ESP partition (1) on $DEVICE ---"
parted "$DEVICE" --script -- mkpart "${ESP_LABEL}" fat32 1MiB "${ESP_SIZE}"
parted "$DEVICE" --script -- set 1 boot on
parted "$DEVICE" --script -- set 1 esp on
sync # Flush kernel buffers to disk
# 5. Create Linux Data Partition
echo ""
echo "--- Creating Linux data partition (2) on $DEVICE ---"
# Use the end of the ESP as the start for the data partition
parted "$DEVICE" --script -- mkpart "${DATA_LABEL}" ext4 "${ESP_SIZE}" 100%
sync # Flush kernel buffers to disk
echo "Waiting briefly for kernel to recognize new partitions..."
sleep 2
# Define partition variables (assuming standard naming, e.g., /dev/sda1, /dev/sda2)
# Adding 'p' for NVMe devices (e.g., /dev/nvme0n1p1) - check if base device name contains 'nvme'
if [[ "$DEVICE" == *nvme* ]]; then
PART_PREFIX="p"
else
PART_PREFIX=""
fi
ESP_PARTITION="${DEVICE}${PART_PREFIX}1"
DATA_PARTITION="${DEVICE}${PART_PREFIX}2"
# Check if partition devices exist, retry with partprobe if needed
echo "--- Checking for partition device nodes (${ESP_PARTITION}, ${DATA_PARTITION}) ---"
PARTITIONS_FOUND=false
for i in {1..5}; do
if [ -b "$ESP_PARTITION" ] && [ -b "$DATA_PARTITION" ]; then
echo "Partition nodes found."
PARTITIONS_FOUND=true
break
fi
echo "Partition nodes not yet found. Retrying probe (Attempt $i/5)..."
partprobe "$DEVICE" || echo "Warning: partprobe command failed, continuing check..."
sleep 1
done
if [ "$PARTITIONS_FOUND" = false ]; then
echo "ERROR: Partition devices ($ESP_PARTITION, $DATA_PARTITION) not found after partitioning and retries." >&2
echo " Please check manually ('lsblk $DEVICE', 'parted $DEVICE print')." >&2
lsblk "$DEVICE"
exit 1
fi
# 6. Verify Partitioning
echo ""
echo "--- Verifying partitions on $DEVICE ---"
parted "$DEVICE" --script -- print
echo ""
echo "--- Block device view: ---"
lsblk -o NAME,SIZE,TYPE,FSTYPE,PARTLABEL,MOUNTPOINT,PARTFLAGS "$DEVICE"
echo "----------------------------"
echo "Expected: ${ESP_PARTITION} (~${ESP_SIZE}), Type EFI System, Flags: boot, esp"
echo "Expected: ${DATA_PARTITION} (Remaining size), Type Linux filesystem"
echo "----------------------------"
sleep 2 # Pause for user to review
# --- Phase 3: Format Filesystems ---
# 1. Format EFI Partition
echo ""
echo "--- Formatting ESP partition (${ESP_PARTITION}) as FAT32 with label '${ESP_LABEL}' ---"
mkfs.vfat -F 32 -n "${ESP_LABEL}" "${ESP_PARTITION}"
sync # Flush kernel buffers to disk
# Check filesystem integrity
echo "--- Checking ESP filesystem (fsck.vfat) ---"
FSCK_VFAT_EXIT_CODE=0
fsck.vfat -a "${ESP_PARTITION}" || FSCK_VFAT_EXIT_CODE=$? # Run fsck, capture exit code on failure
if [ $FSCK_VFAT_EXIT_CODE -eq 0 ]; then
echo "ESP filesystem check passed (or no check performed)."
elif [ $FSCK_VFAT_EXIT_CODE -eq 1 ]; then
# Exit code 1 usually means errors were found AND corrected.
echo "WARNING: fsck.vfat found and corrected errors on ESP partition (${ESP_PARTITION}). Check output above."
else
# Exit codes > 1 typically indicate uncorrected errors.
echo "ERROR: fsck.vfat reported uncorrectable errors (Exit Code: $FSCK_VFAT_EXIT_CODE) on ESP partition (${ESP_PARTITION})." >&2
echo " Cannot proceed safely. Please investigate manually." >&2
exit 1
fi
# Verify label using blkid
echo "--- Verifying ESP label ---"
if blkid -s LABEL -o value "${ESP_PARTITION}" | grep -q "^${ESP_LABEL}$"; then
echo "ESP Label '${ESP_LABEL}' verified successfully on ${ESP_PARTITION}."
else
echo "ERROR: Failed to verify ESP Label '${ESP_LABEL}' on ${ESP_PARTITION}." >&2
blkid "${ESP_PARTITION}" # Show full blkid output for debugging
exit 1
fi
# 2. Format Data Partition
echo ""
echo "--- Formatting Data partition (${DATA_PARTITION}) as ext4 with label '${DATA_LABEL}' ---"
mkfs.ext4 -m 0 -L "${DATA_LABEL}" "${DATA_PARTITION}"
sync # Flush kernel buffers to disk
# Check the new ext4 filesystem integrity
echo "--- Checking Data partition filesystem (e2fsck) ---"
# -f forces check even if clean, -y assumes yes to all prompts (use with caution)
E2FSCK_EXIT_CODE=0
e2fsck -f -y "${DATA_PARTITION}" || E2FSCK_EXIT_CODE=$? # Capture exit code on failure
if [ $E2FSCK_EXIT_CODE -eq 0 ]; then
echo "Data partition filesystem check passed."
elif [ $E2FSCK_EXIT_CODE -eq 1 ]; then
# Exit code 1 means errors were corrected.
echo "WARNING: e2fsck found and corrected errors on Data partition (${DATA_PARTITION}). Check output above."
else
# Exit codes > 1 indicate uncorrected errors.
echo "ERROR: e2fsck reported uncorrectable errors (Exit Code: $E2FSCK_EXIT_CODE) on Data partition (${DATA_PARTITION})." >&2
echo " Cannot proceed safely. Please investigate manually." >&2
exit 1
fi
# Verify the label using blkid
echo "--- Verifying Data partition label ---"
if blkid -s LABEL -o value "${DATA_PARTITION}" | grep -q "^${DATA_LABEL}$"; then
echo "Data Label '${DATA_LABEL}' verified successfully on ${DATA_PARTITION}."
else
echo "ERROR: Failed to verify Data Label '${DATA_LABEL}' on ${DATA_PARTITION}." >&2
blkid "${DATA_PARTITION}" # Show full blkid output for debugging
exit 1
fi
echo ""
echo "-----------------------------------------------------"
echo "--- Script finished successfully! ---"
echo "Device: $DEVICE"
echo "Partitions created and formatted:"
lsblk -o NAME,SIZE,TYPE,FSTYPE,LABEL,PARTLABEL,MOUNTPOINT "$DEVICE"
echo "-----------------------------------------------------"
exit 0
#Phase 3: Install Sbnb Boot Files and Configuration
- Mount EFI Partition: Access the ESP filesystem. Replace
/dev/sdX1
with the actual ESP partition device name identified earlier.echo "--- Mounting ESP partition ---" sudo mkdir -p /mnt/sbnb-mount sudo mount /dev/sdX1 /mnt/sbnb-mount
- Create EFI Boot Directory: Standard UEFI fallback path.
echo "--- Creating EFI boot directories ---" sudo mkdir -p /mnt/sbnb-mount/EFI/BOOT
- Copy Sbnb EFI Boot File: Place the bootloader (
sbnb.efi
asBOOTX64.EFI
). Replace/path/to/your/sbnb.efi
with the actual path to the file you obtained.echo "--- Copying Sbnb EFI boot file ---" sudo cp /path/to/your/sbnb.efi /mnt/sbnb-mount/EFI/BOOT/BOOTX64.EFI
- Run Sbnb Configuration python script: Mount
/dev/sdX1
to/mnt/sbnb
and/dev/sdX2
to/mnt/sbnb-data
. Replacetskey-auth-...
with your actual Tailscale auth key on this python script:
#!/usr/bin/env python3
"""
Unified SBNB Configuration Deployment Script (Version 2.1 - BusyBox cp focus).
Generates configuration files and scripts to:
- Mount a persistent data partition.
- Configure Docker to use a persistent data-root on that partition.
- Optionally migrate existing Docker data from /var/lib/docker robustly using busybox cp.
- Set up backup/purge routines for the persistent Docker data.
- Set up health and volume monitoring for Docker (with safer defaults).
- Deploy a Tailscale authentication key.
- Deploy an optional development environment script.
Core components generated:
- /mnt/sbnb/sbnb-cmds.sh: Main boot script executed by the system.
- /mnt/sbnb/sbnb-tskey.txt: Tailscale authentication key.
- /mnt/sbnb-data/scripts/*: Helper scripts for backup, purge, health checks.
- /mnt/sbnb-data/systemd/*: Systemd units to automate helper scripts.
Prerequisites:
- Run as root.
- ESP partition mounted at /mnt/sbnb (writable).
- Data partition mounted at /mnt/sbnb-data (writable).
- Required: Standard Linux utilities (coreutils including 'cp', systemd, grep, sed, etc.).
- Recommended: `jq` installed on the target system for robust JSON handling.
"""
import os
import stat
import sys
import pathlib
import json
import shutil
from datetime import datetime
# --- Configuration: File Paths ---
# Base mount points - Script will check if these exist and are writable
ESP_MOUNT = "/mnt/sbnb"
DATA_MOUNT = "/mnt/sbnb-data"
# Docker configuration
PERSISTENT_DOCKER_ROOT = f"{DATA_MOUNT}/docker-root"
DOCKER_CONFIG_DIR = "/etc/docker"
DOCKER_CONFIG_FILE = f"{DOCKER_CONFIG_DIR}/daemon.json"
DOCKER_CONFIG_BACKUP_SUFFIX = ".sbnb-orig-backup" # Suffix for one-time backup
DOCKER_DATA_EPHEMERAL = "/var/lib/docker" # Default path for migration check
# Permissions for Docker root dir (rwx--x--x). Owner/Group should be root:root.
# Use standard integer representation for octal in Python
DOCKER_ROOT_PERMISSIONS = 0o711
# Permissions for daemon.json (rw-r--r--)
DOCKER_CONFIG_PERMISSIONS = 0o644
# Backup configuration
BACKUP_BASE_DIR = f"{DATA_MOUNT}/backups/docker"
BACKUP_KEEP_COUNT = 3 # Number of backups to retain
STOP_DOCKER_FOR_BACKUP = 1 # 1 = Stop Docker during backup (safer), 0 = Attempt live backup
# Permissions for backup base directory (rwxr-x---)
BACKUP_DIR_PERMISSIONS = 0o750
# Health Check configuration
VOLUME_CHECK_THRESHOLD_PERCENT = 10 # Warn if free space drops below this %
# Pruning level in volume check: 0=None, 1=Containers/Dangling Images, 2=All Unused Images+Containers (--volumes still excluded)
VOLUME_CHECK_PRUNE_LEVEL = 1
# --- Content Definitions ---
# --- sbnb-cmds.sh Content ---
# REFACTOR: Removed rsync checks and usage, standardized on cp -a -u for migration.
# REFACTOR: Use correct octal format specifier (:o) for mkdir -m and chmod.
SBNB_CMDS_SH_CONTENT = f"""#!/bin/sh
# Sbnb Custom Commands Script (Unified Persistent Docker Root + Features - v2.1 - BusyBox cp)
# Mounts persistent data, configures Docker data-root, migrates data (if needed using cp),
# updates optional scripts, enables systemd units for backup & monitoring.
# Strict error handling
set -e -o pipefail -u
# --- Logging Function ---
log() {{
# Log to kernel message buffer
echo "[sbnb-cmds.sh] $1" > /dev/kmsg
}}
log "Starting custom boot commands (Unified Persistent Docker Root v2.1 - BusyBox cp)..."
# --- Check Core Commands ---
# Ensure essential commands for this script are present
check_cmds() {{
local missing_cmd=0
log "Checking required commands..."
for cmd in "$@"; do
if ! command -v "$cmd" >/dev/null 2>&1; then
log "ERROR: Required command '$cmd' not found."
missing_cmd=1
fi
done
if [ $missing_cmd -eq 1 ]; then
log "ERROR: Missing one or more required commands. Cannot proceed."
exit 1
fi
log "Required commands found."
# Check optional but recommended commands
if ! command -v jq >/dev/null 2>&1; then
log "WARNING: 'jq' command not found. JSON handling for daemon.json will be less robust and may fail on complex existing files."
else
log "OK: 'jq' command found (recommended)."
fi
# Note: rsync check removed as cp -a -u is now the standard method
}}
# Define all commands potentially used in this script
# Removed 'rsync' from the list.
check_cmds mountpoint readlink mkdir mount echo sleep rm find ln systemctl mktemp cp mv chmod chown dirname basename jq grep cat cmp date sed ls
# --- Mount Persistent Data Partition ---
DATA_LABEL="SBNB_DATA"
DATA_DEVICE_SYMLINK="/dev/disk/by-label/${{DATA_LABEL}}"
DATA_MOUNT_POINT="{DATA_MOUNT}"
MAX_WAIT_SECONDS=15
WAIT_INTERVAL=1
elapsed_time=0
log "Waiting up to ${{MAX_WAIT_SECONDS}}s for data device (Label: ${{DATA_LABEL}})..."
while [ ! -e "${{DATA_DEVICE_SYMLINK}}" ]; do
if [ ${{elapsed_time}} -ge ${{MAX_WAIT_SECONDS}} ]; then
log "ERROR: Timeout waiting for device ${{DATA_DEVICE_SYMLINK}}. Persistent data cannot be mounted."
exit 1
fi
sleep ${{WAIT_INTERVAL}}
elapsed_time=$((elapsed_time + WAIT_INTERVAL))
done
DATA_DEVICE=$(readlink -f "${{DATA_DEVICE_SYMLINK}}")
log "Data partition device resolved to ${{DATA_DEVICE}} after ${{elapsed_time}}s."
# Ensure mount point directory exists
mkdir -p "${{DATA_MOUNT_POINT}}"
log "Attempting to mount ${{DATA_DEVICE}} at ${{DATA_MOUNT_POINT}}..."
if ! mountpoint -q "${{DATA_MOUNT_POINT}}"; then
# Attempt to mount read-write, noatime, nodiratime
if mount -o rw,noatime,nodiratime "${{DATA_DEVICE}}" "${{DATA_MOUNT_POINT}}"; then
log "Successfully mounted persistent partition at ${{DATA_MOUNT_POINT}}."
else
log "ERROR: Failed to mount ${{DATA_DEVICE}} at ${{DATA_MOUNT_POINT}}! Check filesystem and device."
exit 1
fi
else
log "Persistent partition already mounted at ${{DATA_MOUNT_POINT}}. Ensuring read-write..."
# Ensure partition is mounted read-write
mount -o remount,rw "${{DATA_MOUNT_POINT}}" || {{
log "ERROR: Failed to remount ${{DATA_MOUNT_POINT}} as read-write! Docker requires write access."
exit 1
}}
fi
# --- Configure Docker to use Persistent Data Directory ---
log "Setting up Docker to use persistent data-root..."
PERSISTENT_DOCKER_ROOT="{PERSISTENT_DOCKER_ROOT}"
DOCKER_CONFIG_DIR="{DOCKER_CONFIG_DIR}"
DOCKER_CONFIG_FILE="{DOCKER_CONFIG_FILE}"
DOCKER_CONFIG_BACKUP="{DOCKER_CONFIG_FILE}{DOCKER_CONFIG_BACKUP_SUFFIX}"
DOCKER_DATA_EPHEMERAL="{DOCKER_DATA_EPHEMERAL}" # For migration check
CONFIG_CHANGED=0 # Flag to track if we need to restart docker
# 1. Ensure the persistent Docker data-root directory exists with correct owner/permissions
log "Ensuring persistent Docker data directory exists: ${{PERSISTENT_DOCKER_ROOT}}"
# Create with specific permissions (rwx--x--x) using correct octal format for command line
mkdir -p -m {DOCKER_ROOT_PERMISSIONS:o} "${{PERSISTENT_DOCKER_ROOT}}"
if [ ! -d "${{PERSISTENT_DOCKER_ROOT}}" ]; then
log "ERROR: Failed to create persistent Docker data directory ${{PERSISTENT_DOCKER_ROOT}}!"
exit 1
fi
# Ensure ownership is root:root (critical for Docker)
log "Ensuring ownership of ${{PERSISTENT_DOCKER_ROOT}} is root:root..."
chown root:root "${{PERSISTENT_DOCKER_ROOT}}" || log "WARNING: Failed to set ownership on ${{PERSISTENT_DOCKER_ROOT}}. Docker might have issues."
# Ensure permissions are correct (mkdir -p doesn't always set mode on existing dirs) using correct octal format for command line
log "Ensuring permissions of ${{PERSISTENT_DOCKER_ROOT}} are {DOCKER_ROOT_PERMISSIONS:o}..."
chmod {DOCKER_ROOT_PERMISSIONS:o} "${{PERSISTENT_DOCKER_ROOT}}" || log "WARNING: Failed to set permissions on ${{PERSISTENT_DOCKER_ROOT}}."
log "Persistent Docker data directory ensured."
# 2. Create/Update Docker daemon configuration (/etc/docker/daemon.json)
log "Configuring Docker daemon (${{DOCKER_CONFIG_FILE}}) to use data-root: ${{PERSISTENT_DOCKER_ROOT}}"
mkdir -p "${{DOCKER_CONFIG_DIR}}" # Ensure config directory exists
# Backup original config ONCE if it exists and backup doesn't
if [ -f "${{DOCKER_CONFIG_FILE}}" ] && [ ! -f "${{DOCKER_CONFIG_BACKUP}}" ]; then
log "Backing up original Docker config to ${{DOCKER_CONFIG_BACKUP}}..."
cp -a "${{DOCKER_CONFIG_FILE}}" "${{DOCKER_CONFIG_BACKUP}}" || \\
log "WARNING: Failed to create backup of ${{DOCKER_CONFIG_FILE}}."
fi
# --- Safely update daemon.json ---
NEEDS_UPDATE=0
# Use jq if available (preferred method)
if command -v jq >/dev/null 2>&1; then
log "Using jq to manage daemon.json."
# Ensure file exists with at least {{}} for jq processing
[ -f "${{DOCKER_CONFIG_FILE}}" ] || echo "{{}}" > "${{DOCKER_CONFIG_FILE}}"
# Read current value safely, defaulting to empty string if null or missing
current_data_root=$(jq -r '.["data-root"] // ""' "${{DOCKER_CONFIG_FILE}}")
if [ "$current_data_root" != "${{PERSISTENT_DOCKER_ROOT}}" ]; then
log "Data-root needs update (jq check). Preparing changes..."
NEEDS_UPDATE=1
else
log "Docker data-root already correctly set in daemon.json (jq check)."
fi
if [ $NEEDS_UPDATE -eq 1 ]; then
TMP_JSON=$(mktemp "${{DOCKER_CONFIG_DIR}}/daemon.json.tmp.XXXXXX")
log "Attempting to merge data-root setting using jq..."
# Merge the new data-root value, preserving other keys
if jq --arg path "${{PERSISTENT_DOCKER_ROOT}}" '. + {{"data-root": $path}}' "${{DOCKER_CONFIG_FILE}}" > "${{TMP_JSON}}"; then
# Check if jq produced valid JSON
if jq -e . "${{TMP_JSON}}" > /dev/null; then
# Check if content actually changed before moving
if ! cmp -s "${{TMP_JSON}}" "${{DOCKER_CONFIG_FILE}}"; then
mv "${{TMP_JSON}}" "${{DOCKER_CONFIG_FILE}}"
chmod {DOCKER_CONFIG_PERMISSIONS:o} "${{DOCKER_CONFIG_FILE}}" # Use correct format here too for consistency, though 644 doesn't need '0o' prefix
log "Successfully updated daemon.json using jq."
CONFIG_CHANGED=1
else
log "daemon.json content unchanged after jq merge, removing temp file."
rm -f "${{TMP_JSON}}"
fi
else
log "ERROR: jq produced invalid JSON output. Config not updated."
rm -f "${{TMP_JSON}}" # Clean up temp file
fi
else
jq_exit_code=$?
log "ERROR: jq command failed (exit code $jq_exit_code) while updating config. Config not updated."
# Optionally capture and log jq stderr here if needed
rm -f "${{TMP_JSON}}" # Clean up temp file
fi
fi
# Fallback logic if jq is NOT available
else
log "WARNING: jq not found. Using less robust fallback for daemon.json."
# Define the minimal target content
TARGET_JSON_CONTENT=$(printf '{{%s\\n "data-root": "%s"%s\\n}}%s\\n' "" "${{PERSISTENT_DOCKER_ROOT}}" "" "")
if [ ! -f "${{DOCKER_CONFIG_FILE}}" ]; then
log "daemon.json does not exist. Creating new file with data-root."
NEEDS_UPDATE=1
else
# Check if data-root key exists at all
if ! grep -q '"data-root"\\s*:' "${{DOCKER_CONFIG_FILE}}"; then
log "Existing daemon.json lacks 'data-root' key."
# Check if the file is simple (e.g., just {{}} or empty/whitespace)
if ! grep -q '[a-zA-Z0-9]' "${{DOCKER_CONFIG_FILE}}" || grep -q '^\\s*{{\\s*}}\\s*$' "${{DOCKER_CONFIG_FILE}}"; then
log "Existing file is simple, overwriting with data-root."
NEEDS_UPDATE=1
else
log "ERROR: Existing daemon.json is complex and lacks 'data-root'. Cannot safely update without jq. Install jq or manually edit."
# Do not proceed with overwrite
NEEDS_UPDATE=0 # Explicitly prevent update
fi
# Key exists, check if the value is correct (basic check)
elif ! grep -q '"data-root"\\s*:\\s*"${{PERSISTENT_DOCKER_ROOT}}"' "${{DOCKER_CONFIG_FILE}}"; then
log "ERROR: Existing daemon.json has 'data-root' but points elsewhere. Cannot safely update without jq. Install jq or manually edit."
# Do not proceed with overwrite
NEEDS_UPDATE=0 # Explicitly prevent update
else
log "daemon.json exists and data-root seems correct (grep check)."
NEEDS_UPDATE=0
fi
fi
# Perform write only if deemed safe and necessary by the logic above
if [ $NEEDS_UPDATE -eq 1 ]; then
log "Writing daemon.json (simple method)..."
TMP_JSON=$(mktemp "${{DOCKER_CONFIG_DIR}}/daemon.json.tmp.XXXXXX")
echo "$TARGET_JSON_CONTENT" > "${{TMP_JSON}}"
if [ $? -eq 0 ]; then
mv "${{TMP_JSON}}" "${{DOCKER_CONFIG_FILE}}"
chmod {DOCKER_CONFIG_PERMISSIONS:o} "${{DOCKER_CONFIG_FILE}}" # Use correct format here too
log "Successfully wrote simple daemon.json."
CONFIG_CHANGED=1
else
log "ERROR: Failed to write temporary simple daemon.json! Config not updated."
rm -f "${{TMP_JSON}}"
fi
fi
fi
log "Docker daemon configuration check finished."
# 3. Data Migration (Optional): Migrate data from ephemeral location if needed
log "Checking for existing Docker data in ephemeral location (${{DOCKER_DATA_EPHEMERAL}})..."
# Check if the directory exists and contains anything other than 'lost+found' or potential marker files
if [ -d "${{DOCKER_DATA_EPHEMERAL}}" ] && [ -n "$(ls -A "${{DOCKER_DATA_EPHEMERAL}}" | grep -v -e '^lost+found$' -e '^\\.sbnb_persistent_redirect$' -e '^README_DO_NOT_USE\\.txt$' 2>/dev/null)" ]; then
log "Found potentially significant data in ${{DOCKER_DATA_EPHEMERAL}}."
# Check if persistent location is effectively empty (allowing only lost+found)
persistent_is_empty=0
if [ ! "$(ls -A "${{PERSISTENT_DOCKER_ROOT}}" | grep -v '^lost+found$' 2>/dev/null)" ]; then
persistent_is_empty=1
fi
if [ $persistent_is_empty -eq 1 ]; then
log "Persistent location ${{PERSISTENT_DOCKER_ROOT}} is empty. Migrating data..."
# Ensure Docker is stopped before migration
if systemctl is-active --quiet docker; then
log "Stopping Docker service for migration..."
systemctl stop docker || log "WARNING: Failed to stop Docker. Migration proceeding, but data might be inconsistent!"
sleep 3 # Give it time to release files
fi
log "Starting migration using cp -a -u..."
MIGRATION_SUCCESS=0
# Use cp -a -u: archive mode (preserve attrs), update mode (copy only if newer/missing).
# Source ends with /. to copy contents including hidden files.
# This is the recommended busybox alternative to rsync for local mirroring.
if cp -a -u "${{DOCKER_DATA_EPHEMERAL}}/." "${{PERSISTENT_DOCKER_ROOT}}/"; then
MIGRATION_SUCCESS=1
else
log "ERROR: cp -a -u migration failed with exit code $? !"
fi
# Handle migration outcome
if [ $MIGRATION_SUCCESS -eq 1 ]; then
log "Migration completed successfully."
# Rename old data directory as backup
OLD_DATA_BACKUP="${{DOCKER_DATA_EPHEMERAL}}.migrated.$(date +%Y%m%d_%H%M%S).bak"
log "Attempting to rename old data directory to ${{OLD_DATA_BACKUP}}..."
# Use mv -T to handle if ephemeral is somehow a symlink
if mv -T "${{DOCKER_DATA_EPHEMERAL}}" "${{OLD_DATA_BACKUP}}"; then
log "Successfully renamed old data directory."
else
log "WARNING: Could not rename old data directory ${{DOCKER_DATA_EPHEMERAL}}. It may still contain data."
# Consider rm -rf here ONLY if migration verification was very thorough, otherwise leave it.
fi
# Mark that Docker needs restart due to migration
CONFIG_CHANGED=1
else
log "ERROR: Data migration failed! Docker data may be incomplete or inconsistent in ${{PERSISTENT_DOCKER_ROOT}}."
# Exiting is likely the safest option here to force manual review.
exit 1
fi
else
log "Persistent location ${{PERSISTENT_DOCKER_ROOT}} already contains data. Skipping migration."
# Optionally rename the ephemeral data if it still exists and is unwanted
OLD_DATA_BACKUP="${{DOCKER_DATA_EPHEMERAL}}.ignored.$(date +%Y%m%d_%H%M%S).bak"
log "Attempting to rename unused ephemeral data directory to ${{OLD_DATA_BACKUP}}..."
mv -T "${{DOCKER_DATA_EPHEMERAL}}" "${{OLD_DATA_BACKUP}}" || \\
log "WARNING: Could not rename ephemeral data directory ${{DOCKER_DATA_EPHEMERAL}}."
fi
else
log "No significant data found in ephemeral location ${{DOCKER_DATA_EPHEMERAL}}. No migration needed."
fi
# Ensure the original ephemeral directory path exists but is empty, with a marker
log "Ensuring ephemeral path ${{DOCKER_DATA_EPHEMERAL}} exists and is marked as unused."
# Remove original path if it still exists (e.g., if rename failed but we continued)
if [ -d "${{DOCKER_DATA_EPHEMERAL}}" ]; then
rm -rf "${{DOCKER_DATA_EPHEMERAL}}" || log "WARNING: Failed to remove original ephemeral directory after processing."
fi
mkdir -p "${{DOCKER_DATA_EPHEMERAL}}"
touch "${{DOCKER_DATA_EPHEMERAL}}/.sbnb_persistent_redirect"
echo "Docker data is managed at ${{PERSISTENT_DOCKER_ROOT}}. This directory should remain empty." > "${{DOCKER_DATA_EPHEMERAL}}/README_DO_NOT_USE.txt"
chmod 644 "${{DOCKER_DATA_EPHEMERAL}}/README_DO_NOT_USE.txt" # 644 doesn't need :o format
chmod 600 "${{DOCKER_DATA_EPHEMERAL}}/.sbnb_persistent_redirect" # 600 doesn't need :o format
log "Data migration check finished."
# 4. Restart Docker Service *if* configuration was changed OR migration occurred
if [ $CONFIG_CHANGED -eq 1 ]; then
log "Configuration or data migration requires Docker restart. Reloading daemon and restarting service..."
if ! systemctl daemon-reload; then
log "ERROR: Failed to reload systemd daemon! Docker restart might fail or use old config."
exit 1 # Critical failure if daemon cannot reload
fi
log "Attempting to restart docker.service..."
if systemctl restart docker.service; then
log "Docker service restarted successfully."
else
log "ERROR: Failed to restart Docker service! Check 'journalctl -u docker.service'."
exit 1 # Critical failure if Docker doesn't restart after config change/migration
fi
else
log "No configuration changes or migration. Docker restart not required by this script."
# Optional: Ensure Docker is running even if no changes occurred
# log "Ensuring Docker service is active..."
# if ! systemctl is-active --quiet docker.service; then
# log "Docker service is not active. Attempting to start..."
# systemctl start docker.service || log "WARNING: Failed to start inactive Docker service."
# fi
fi
log "Docker setup finished."
# --- Update Optional Development Environment Script ---
# (Using the robust atomic update logic)
TARGET_DEV_ENV_SCRIPT="/usr/sbin/sbnb-dev-env.sh"
SOURCE_DEV_ENV_SCRIPT="${{DATA_MOUNT_POINT}}/scripts/sbnb-dev-env.sh" # Assuming it's stored persistently
log "Checking for optional development script update: ${{SOURCE_DEV_ENV_SCRIPT}}"
if [ -f "${{SOURCE_DEV_ENV_SCRIPT}}" ] && [ -r "${{SOURCE_DEV_ENV_SCRIPT}}" ]; then
log "Source script found. Attempting atomic update of ${{TARGET_DEV_ENV_SCRIPT}}..."
TARGET_DIR=$(dirname "${{TARGET_DEV_ENV_SCRIPT}}")
TMP_SCRIPT=""
# Setup trap for cleanup
trap 'sbnb_dev_cleanup' EXIT HUP INT QUIT TERM
sbnb_dev_cleanup() {{
if [ -n "${{TMP_SCRIPT:-}}" ] && [ -f "${{TMP_SCRIPT}}" ]; then
rm -f "${{TMP_SCRIPT}}"
log "Cleaned up temporary file ${{TMP_SCRIPT}}"
fi
trap - EXIT HUP INT QUIT TERM # Reset trap
}}
if [ ! -d "${{TARGET_DIR}}" ] || [ ! -w "${{TARGET_DIR}}" ]; then
log "WARNING: Target directory ${{TARGET_DIR}} does not exist or is not writable. Cannot update script."
# Check required commands exist (already done by check_cmds, but good practice here too)
elif ! command -v mktemp >/dev/null 2>&1 || ! command -v cp >/dev/null 2>&1 || ! command -v chmod >/dev/null 2>&1 || ! command -v mv >/dev/null 2>&1; then
log "WARNING: Required command (mktemp/cp/chmod/mv) not found. Skipping update."
else
TMP_SCRIPT=$(mktemp "${{TARGET_DIR}}/sbnb-dev-env.sh.XXXXXX")
if [ -z "${{TMP_SCRIPT}}" ] || [ ! -f "${{TMP_SCRIPT}}" ]; then
log "WARNING: Failed to create temporary file in ${{TARGET_DIR}}. Skipping update."
TMP_SCRIPT="" # Prevent trap from trying to remove nothing
else
# Proceed with copy, chmod, move
if cp "${{SOURCE_DEV_ENV_SCRIPT}}" "${{TMP_SCRIPT}}"; then
if chmod +x "${{TMP_SCRIPT}}"; then
# Use mv -T to handle target being a symlink correctly
if mv -T "${{TMP_SCRIPT}}" "${{TARGET_DEV_ENV_SCRIPT}}"; then
log "Successfully updated ${{TARGET_DEV_ENV_SCRIPT}}."
TMP_SCRIPT="" # Clear var so trap doesn't remove the final script
else log "WARNING: Failed to move temporary file ${{TMP_SCRIPT}} to ${{TARGET_DEV_ENV_SCRIPT}}. Update failed."; fi
else log "WARNING: Failed to set execute permissions on temporary file ${{TMP_SCRIPT}}. Update failed."; fi
else log "WARNING: Failed to copy content from ${{SOURCE_DEV_ENV_SCRIPT}} to ${{TMP_SCRIPT}}. Update failed."; fi
fi
# Clean up temp file if it still exists (e.g., on mv failure) and TMP_SCRIPT is set
if [ -n "${{TMP_SCRIPT:-}}" ] && [ -f "${{TMP_SCRIPT}}" ]; then rm -f "${{TMP_SCRIPT}}"; fi
TMP_SCRIPT="" # Ensure trap doesn't run again for this
fi
trap - EXIT HUP INT QUIT TERM # Clear trap explicitly
else
log "NOTE: Source script ${{SOURCE_DEV_ENV_SCRIPT}} not found or not readable. Skipping update."
fi
log "Update of optional script finished."
# --- Enable Systemd Units (Backup/Purge + Health/Volume Checks) ---
SYSTEMD_SOURCE_DIR="${{DATA_MOUNT_POINT}}/systemd"
SYSTEMD_TARGET_DIR="/etc/systemd/system"
TIMERS_WANTS_DIR="${{SYSTEMD_TARGET_DIR}}/timers.target.wants"
log "Enabling custom systemd units (Source: ${{SYSTEMD_SOURCE_DIR}})..."
if [ -d "${{SYSTEMD_SOURCE_DIR}}" ] && [ -r "${{SYSTEMD_SOURCE_DIR}}" ]; then
mkdir -p "${{SYSTEMD_TARGET_DIR}}"
mkdir -p "${{TIMERS_WANTS_DIR}}"
# Check ln and systemctl exist (already done in check_cmds)
linked_any=0
log "Linking systemd unit files..."
# Use find with -print0 and read -d '' for safe filename handling
find "${{SYSTEMD_SOURCE_DIR}}" -maxdepth 1 -type f \\( -name '*.service' -o -name '*.timer' \\) -print0 | while IFS= read -r -d '' source_unit; do
unit_name=$(basename "${{source_unit}}")
target_link="${{SYSTEMD_TARGET_DIR}}/${{unit_name}}"
log " Linking ${{unit_name}}..."
# Use ln -sf: symbolic, force overwrite if link exists
if ln -sf "${{source_unit}}" "${{target_link}}"; then
linked_any=1
else
log " WARNING: Failed to link ${{unit_name}}."
fi
done
if [ $linked_any -eq 0 ]; then
log "No unit files found in ${{SYSTEMD_SOURCE_DIR}} to link."
else
log "Reloading systemd daemon after linking units..."
# Reload daemon again (might be redundant if Docker restart already did it, but safe)
systemctl daemon-reload || log "WARNING: systemctl daemon-reload failed after linking units."
log "Enabling systemd timers/services..."
enabled_any=0
# Define ALL units expected to be enabled by this script
UNITS_TO_ENABLE="docker-backup.timer docker-purge.timer docker-shutdown-backup.service docker-health-check.timer docker-volume-check.timer"
final_enabled_list=""
# Use 'for unit in $UNITS_TO_ENABLE' which relies on word splitting
# shellcheck disable=SC2086
for unit in $UNITS_TO_ENABLE; do
# Check if the link exists and points to a file before enabling
if [ -L "${{SYSTEMD_TARGET_DIR}}/${{unit}}" ] && [ -f "${{SYSTEMD_TARGET_DIR}}/${{unit}}" ]; then
log " Enabling ${{unit}}..."
# Use --now to also start timers immediately if desired, otherwise just enable
if systemctl enable "${{unit}}"; then
enabled_any=1
final_enabled_list="${{final_enabled_list}} ${{unit}}"
else
log " WARNING: Failed to enable ${{unit}}."
fi
else
log " Skipping enable for ${{unit}} (link missing or broken)."
fi
done
if [ $enabled_any -eq 1 ]; then
final_enabled_list=$(echo "${{final_enabled_list}}" | sed 's/^ *//') # Remove leading space
log "Systemd units enabled successfully: ${{final_enabled_list}}"
else
log "No relevant systemd units were successfully enabled."
fi
fi # end if linked_any
else
log "WARNING: Systemd source directory ${{SYSTEMD_SOURCE_DIR}} not found or not readable. Cannot enable units."
fi
log "Systemd unit setup finished."
# --- Script Finish Logging ---
log "Finished custom boot commands successfully."
# Clear trap explicitly
trap - EXIT HUP INT QUIT TERM
exit 0
"""
# --- Tailscale Key ---
# !!! REPLACE THIS WITH YOUR ACTUAL KEY !!!
SBNB_TSKEY_TXT_CONTENT = "tskey-auth-..." # Placeholder
# --- Backup Script ---
BACKUP_DOCKER_SH_CONTENT = f"""#!/bin/sh
# File: {DATA_MOUNT}/scripts/backup-docker.sh
# Backs up the persistent Docker data-root directory.
set -e -u
# --- Configuration ---
DOCKER_DATA_DIR="{PERSISTENT_DOCKER_ROOT}" # Source is PERSISTENT root
BACKUP_DIR="{BACKUP_BASE_DIR}"
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
BACKUP_FILE="${{BACKUP_DIR}}/docker_backup_${{TIMESTAMP}}.tar.gz"
LATEST_LINK="${{BACKUP_DIR}}/docker_latest.tar.gz"
STOP_DOCKER={STOP_DOCKER_FOR_BACKUP} # 1=Stop Docker (safer), 0=Live backup
log() {{ echo "[backup-docker.sh] $1" > /dev/kmsg; }}
# --- Check Commands ---
log "Checking required commands..."
check_cmds() {{
local missing_cmd=0
for cmd in "$@"; do
if ! command -v "$cmd" >/dev/null 2>&1; then log "ERROR: Command '$cmd' not found."; missing_cmd=1; fi
done
# Exit if any command is missing
[ $missing_cmd -eq 1 ] && exit 1
}}
# Core commands needed
check_cmds date mkdir tar gzip ln mv sleep dirname basename
# Check systemctl only if stopping docker is enabled
[ $STOP_DOCKER -eq 1 ] && check_cmds systemctl
# Check for optional 'nice' command
NICE_CMD=""
if command -v nice >/dev/null 2>&1; then NICE_CMD="nice -n 19"; log "Using nice for lower tar priority."; fi
# --- Main Logic ---
log "Starting Docker backup process..."
log "Source: ${{DOCKER_DATA_DIR}}"
log "Destination: ${{BACKUP_FILE}}"
# Ensure backup directory exists and is writable
log "Ensuring backup directory exists: ${{BACKUP_DIR}}"
mkdir -p "${{BACKUP_DIR}}"
# Check write permissions specifically
if [ ! -w "${{BACKUP_DIR}}" ]; then log "ERROR: Backup directory not writable: ${{BACKUP_DIR}}"; exit 1; fi
# Stop Docker if configured
DOCKER_WAS_RUNNING=0
if [ $STOP_DOCKER -eq 1 ]; then
log "Attempting to stop Docker service..."
if systemctl is-active --quiet docker.service; then
DOCKER_WAS_RUNNING=1
log "Docker service is active, stopping..."
if systemctl stop docker.service; then
log "Docker service stopped. Waiting 5s for files to release..."; sleep 5
else
# If stop fails, warn but maybe proceed? Or exit? Exiting might be safer.
log "ERROR: Failed to stop Docker service gracefully! Backup might be inconsistent or fail. Aborting."
exit 1 # Exit if stop fails, as backup consistency is compromised
fi
else
log "Docker service already stopped."
fi
fi
# Create backup
log "Creating backup archive..."
if [ -d "${{DOCKER_DATA_DIR}}" ] && [ -r "${{DOCKER_DATA_DIR}}" ]; then
PARENT_DIR=$(dirname "${{DOCKER_DATA_DIR}}")
SOURCE_BASENAME=$(basename "${{DOCKER_DATA_DIR}}")
log "Archiving '${{SOURCE_BASENAME}}' from parent '${{PARENT_DIR}}'..."
# Use -C to change directory, archive relative path 'docker-root/...'
# Add --warning=no-file-changed to suppress warnings about files changing during read
# shellcheck disable=SC2086 # Allow word splitting for $NICE_CMD
if ${{NICE_CMD}} tar --warning=no-file-changed -czf "${{BACKUP_FILE}}" -C "${{PARENT_DIR}}" "${{SOURCE_BASENAME}}"; then
log "Backup archive created successfully."
# Verify backup file exists and is not empty
if [ -s "${{BACKUP_FILE}}" ]; then
log "Updating latest backup link..."
# Atomic symlink update: create temp link, then rename over old one
ln -sfT "${{BACKUP_FILE}}" "${{LATEST_LINK}}.tmp" && mv -Tf "${{LATEST_LINK}}.tmp" "${{LATEST_LINK}}"
if [ $? -eq 0 ]; then
log "Updated latest link to point to ${{BACKUP_FILE}}."
else
log "WARNING: Failed to update latest backup link."
rm -f "${{LATEST_LINK}}.tmp" # Clean up temp link if mv failed
fi
else
log "WARNING: Backup file seems invalid (empty/missing): ${{BACKUP_FILE}}. Removing."
rm -f "${{BACKUP_FILE}}"
fi
else
tar_exit_code=$?
log "ERROR: tar command failed with exit code ${{tar_exit_code}}! Backup failed."
rm -f "${{BACKUP_FILE}}" # Clean up partial archive if tar failed
fi
else
log "WARNING: Docker data directory not found or not readable: ${{DOCKER_DATA_DIR}}. Skipping backup."
fi
# Restart Docker if it was running and we stopped it successfully
if [ $DOCKER_WAS_RUNNING -eq 1 ]; then
log "Restarting Docker service..."
if ! systemctl start docker.service; then
log "WARNING: Failed to restart Docker service after backup."
else
log "Docker service restarted."
fi
fi
log "Docker backup process finished."
exit 0
"""
# --- Purge Script ---
PURGE_DOCKER_BACKUPS_SH_CONTENT = f"""#!/bin/sh
# File: {DATA_MOUNT}/scripts/purge-docker-backups.sh
# Removes old Docker backups, keeping the last N.
set -e -u
BACKUP_DIR="{BACKUP_BASE_DIR}"
KEEP_COUNT={BACKUP_KEEP_COUNT}
log() {{ echo "[purge-docker-backups.sh] $1" > /dev/kmsg; }}
# Check commands
check_cmds() {{
local missing_cmd=0
for cmd in "$@"; do if ! command -v "$cmd" >/dev/null 2>&1; then log "ERROR: Command '$cmd' not found."; missing_cmd=1; fi; done
[ $missing_cmd -eq 1 ] && exit 1
}}
check_cmds find wc sort head cut xargs rm mkdir date
log "Purging old Docker backups in ${{BACKUP_DIR}}, keeping ${{KEEP_COUNT}}..."
# Validate KEEP_COUNT
if ! [ "$KEEP_COUNT" -ge 0 ] 2>/dev/null; then log "ERROR: KEEP_COUNT (${{KEEP_COUNT}}) is invalid."; exit 1; fi
# Ensure backup directory exists and is accessible
if ! mkdir -p "${{BACKUP_DIR}}"; then log "ERROR: Failed to create backup directory ${{BACKUP_DIR}}!"; exit 1; fi
if [ ! -d "${{BACKUP_DIR}}" ] || [ ! -r "${{BACKUP_DIR}}" ] || [ ! -w "${{BACKUP_DIR}}" ]; then log "ERROR: Cannot access backup directory ${{BACKUP_DIR}}!"; exit 1; fi
# Count existing backups safely
log "Counting existing backup files..."
backup_count=$(find "${{BACKUP_DIR}}" -maxdepth 1 -name 'docker_backup_*.tar.gz' -type f -print 2>/dev/null | wc -l)
find_exit_code=$?
if [ $find_exit_code -ne 0 ]; then log "WARNING: find command failed (${{find_exit_code}}) while counting backups. Skipping purge."; exit 0; fi
log "Found ${{backup_count}} backup files."
if [ "$backup_count" -gt "$KEEP_COUNT" ]; then
to_delete_count=$(( backup_count - KEEP_COUNT ))
log "Need to delete ${{to_delete_count}} oldest backup(s)."
# Use find -printf with null terminators for safe filename handling
log "Identifying oldest backups to delete..."
delete_output=$(find "${{BACKUP_DIR}}" -maxdepth 1 -name 'docker_backup_*.tar.gz' -type f -printf '%T@ %p\\0' 2>/dev/null | \\
sort -zn | \\
head -zn "${{to_delete_count}}" | \\
cut -z -d' ' -f2- | \\
xargs -0 -r rm -v -- 2>&1) # Capture rm output (stdout+stderr)
rm_exit_code=$?
if [ $rm_exit_code -eq 0 ]; then
log "Purge completed successfully."
if [ -n "$delete_output" ]; then
log "Deleted files:"
# Log multi-line output safely
echo "$delete_output" | while IFS= read -r line || [ -n "$line" ]; do log " $line"; done
fi
else
log "WARNING: Purge command (rm) failed (exit code ${{rm_exit_code}}). Check output below."
log "rm output:"
echo "$delete_output" | while IFS= read -r line || [ -n "$line" ]; do log " $line"; done
fi
else
log "${{backup_count}} backups found <= ${{KEEP_COUNT}}. No backups purged."
fi
log "Backup purge process finished."
exit 0
"""
# --- Health Check Script ---
DOCKER_HEALTH_CHECK_SH_CONTENT = f"""#!/bin/sh
# File: {DATA_MOUNT}/scripts/docker-health-check.sh
# Checks Docker daemon health, responsiveness, and data-root configuration.
set -e -u
PERSISTENT_ROOT="{PERSISTENT_DOCKER_ROOT}"
DOCKER_CONFIG_FILE="{DOCKER_CONFIG_FILE}"
log() {{ echo "[docker-health-check] $1" | tee /dev/kmsg; }} # Log to kmsg and stdout/stderr
log "Starting Docker health check..."
# Check required commands
check_cmds() {{ for cmd in "$@"; do if ! command -v "$cmd" >/dev/null 2>&1; then log "ERROR: Command '$cmd' not found."; exit 1; fi; done }}
check_cmds systemctl docker
# Check if Docker daemon service is running
log "Checking if docker.service is active..."
if ! systemctl is-active --quiet docker.service; then
log "WARNING: Docker service is not running. Attempting restart..."
if systemctl restart docker.service; then
log "Docker service restarted successfully."
sleep 5 # Give it time to fully start
else
log "ERROR: Failed to restart inactive Docker service!"
exit 1 # Critical failure if it should be running but can't be started
fi
fi
# Verify Docker daemon is responding to commands
log "Checking Docker daemon responsiveness via 'docker info'..."
if ! docker info > /dev/null 2>&1; then
log "WARNING: Docker service is running but 'docker info' command failed. Attempting restart..."
if systemctl restart docker.service; then
log "Docker service restarted successfully."
sleep 5 # Give it time
# Re-check responsiveness after restart
if ! docker info > /dev/null 2>&1; then
log "ERROR: Docker daemon still not responding after restart! Requires manual investigation."
exit 1 # Critical failure
else
log "Docker daemon is now responsive after restart."
fi
else
log "ERROR: Failed to restart unresponsive Docker service!"
exit 1 # Critical failure
fi
else
log "Docker daemon is responsive."
fi
# Check if Docker is using the correct data-root directory
log "Checking configured Docker data-root directory..."
# Use docker info with Go template for precise extraction
CURRENT_ROOT=$(docker info --format '{{{{.DockerRootDir}}}}' 2>/dev/null || echo "ERROR_GETTING_INFO")
if [ "$CURRENT_ROOT" = "ERROR_GETTING_INFO" ]; then
log "ERROR: Could not determine Docker's current data-root using 'docker info'. Health check incomplete."
exit 1 # Exit as this is a significant issue
elif [ "$CURRENT_ROOT" != "$PERSISTENT_ROOT" ]; then
log "CRITICAL ERROR: Docker is using incorrect data-root!"
log " Expected: $PERSISTENT_ROOT"
log " Actual: $CURRENT_ROOT"
log "This indicates a configuration problem in $DOCKER_CONFIG_FILE or Docker failed to apply it. Manual intervention required."
exit 1 # Critical configuration error
else
log "Docker is correctly using the persistent data-root: $PERSISTENT_ROOT"
fi
log "Docker health check completed successfully."
exit 0
"""
# --- Volume Check Script ---
# Define prune command based on configuration
if VOLUME_CHECK_PRUNE_LEVEL == 0:
PRUNE_COMMAND = "echo 'Automatic pruning disabled.'" # No-op
elif VOLUME_CHECK_PRUNE_LEVEL == 1:
# Prune stopped containers and dangling images only
PRUNE_COMMAND = "docker container prune -f && docker image prune -f"
elif VOLUME_CHECK_PRUNE_LEVEL >= 2:
# Prune stopped containers and *all* unused images (more aggressive)
PRUNE_COMMAND = "docker container prune -f && docker image prune -a -f"
else: # Default to level 1 if invalid config
PRUNE_COMMAND = "docker container prune -f && docker image prune -f"
DOCKER_VOLUME_CHECK_SH_CONTENT = f"""#!/bin/sh
# File: {DATA_MOUNT}/scripts/docker-volume-check.sh
# Checks free space on the Docker persistent volume and optionally prunes resources.
set -e -u
DOCKER_ROOT="{PERSISTENT_DOCKER_ROOT}"
MIN_FREE_PERCENT={VOLUME_CHECK_THRESHOLD_PERCENT}
# Prune command determined by Python script configuration (Level: {VOLUME_CHECK_PRUNE_LEVEL})
PRUNE_CMD="{PRUNE_COMMAND}"
log() {{ echo "[docker-volume-check] $1" | tee /dev/kmsg; }}
log "Checking Docker volume free space: ${{DOCKER_ROOT}}"
# Check required commands
check_cmds() {{ for cmd in "$@"; do if ! command -v "$cmd" >/dev/null 2>&1; then log "ERROR: Command '$cmd' not found."; exit 1; fi; done }}
check_cmds df awk sed docker # Need docker if pruning is enabled
# Check if the Docker root directory exists
if [ ! -d "$DOCKER_ROOT" ]; then log "ERROR: Docker root directory not found: $DOCKER_ROOT"; exit 1; fi
# Get free space percentage using df -P for POSIX compatibility
log "Calculating free space..."
# Get Available and Total blocks (in 1K blocks usually)
df_output=$(df -P "$DOCKER_ROOT" | awk 'NR==2 {{print $4, $2}}' 2>/dev/null)
if [ -z "$df_output" ]; then log "ERROR: Failed to get disk usage using df for $DOCKER_ROOT"; exit 1; fi
avail_kb=$(echo "$df_output" | awk '{{print $1}}')
total_kb=$(echo "$df_output" | awk '{{print $2}}')
# Handle edge case where total size is 0 or df failed weirdly
if [ -z "$total_kb" ] || [ "$total_kb" -le 0 ]; then
log "WARNING: Total disk size reported as zero or invalid for $DOCKER_ROOT. Cannot calculate percentage."
exit 0
fi
# Calculate free percentage using integer arithmetic
free_percent=$(( (avail_kb * 100) / total_kb ))
# Get human-readable sizes for logging
total_size_hr=$(df -h "$DOCKER_ROOT" | awk 'NR==2 {{print $2}}')
avail_size_hr=$(df -h "$DOCKER_ROOT" | awk 'NR==2 {{print $4}}')
log "Volume Stats: Total=${{total_size_hr}}, Available=${{avail_size_hr}}, Free=${{free_percent}}%"
# Check against threshold
if [ "$free_percent" -lt "$MIN_FREE_PERCENT" ]; then
log "WARNING: Low disk space! Free: ${{free_percent}}% (Threshold: ${{MIN_FREE_PERCENT}}%)"
# Attempt to prune based on configured level
if [ {VOLUME_CHECK_PRUNE_LEVEL} -gt 0 ]; then
log "Attempting automatic prune (Level: {VOLUME_CHECK_PRUNE_LEVEL})..."
prune_output=$({PRUNE_COMMAND} 2>&1) || prune_exit_code=$?
# Check exit code, prune can return non-zero even if it works partially
if [ "${{prune_exit_code:-0}}" -eq 0 ]; then
log "Docker prune command executed successfully."
else
log "WARNING: Docker prune command finished with exit code ${{prune_exit_code}}."
fi
log "Prune output:"
echo "$prune_output" | while IFS= read -r line || [ -n "$line" ]; do log " $line"; done
# Recalculate free space after pruning
log "Recalculating space after cleanup..."
df_output=$(df -P "$DOCKER_ROOT" | awk 'NR==2 {{print $4, $2}}' 2>/dev/null)
avail_kb=$(echo "$df_output" | awk '{{print $1}}')
total_kb=$(echo "$df_output" | awk '{{print $2}}')
if [ "$total_kb" -gt 0 ]; then free_percent=$(( (avail_kb * 100) / total_kb )); else free_percent=0; fi
avail_size_hr=$(df -h "$DOCKER_ROOT" | awk 'NR==2 {{print $4}}')
log "Space after cleanup: Available=${{avail_size_hr}}, Free=${{free_percent}}%"
if [ "$free_percent" -lt "$MIN_FREE_PERCENT" ]; then
log "ERROR: Space still critically low after cleanup! Manual intervention likely required."
else
log "Space is now above threshold after cleanup."
fi
else
log "Automatic pruning is disabled (Level 0). Manual cleanup needed."
fi
else
log "Sufficient free space available (${{free_percent}}%)."
fi
log "Docker volume check completed."
exit 0
"""
# --- Systemd Units (Content definitions remain the same as previous version) ---
# Backup Service
DOCKER_BACKUP_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-backup.service
[Unit]
Description=Backup Docker Data ({PERSISTENT_DOCKER_ROOT})
Documentation=file://{DATA_MOUNT}/scripts/backup-docker.sh
Requires=mnt-sbnb-data.mount
After=mnt-sbnb-data.mount docker.service # Ensure mount and docker are up
[Service]
Type=oneshot
ExecStart=/bin/sh {DATA_MOUNT}/scripts/backup-docker.sh
"""
# Backup Timer
DOCKER_BACKUP_TIMER_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-backup.timer
[Unit]
Description=Daily Docker Backup Timer ({PERSISTENT_DOCKER_ROOT})
Requires=docker-backup.service
[Timer]
OnCalendar=*-*-* 05:00:00
AccuracySec=1h
Persistent=true
RandomizedDelaySec=600 # 10 minutes
Unit=docker-backup.service
[Install]
WantedBy=timers.target
"""
# Purge Service
DOCKER_PURGE_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-purge.service
[Unit]
Description=Purge Old Docker Backups ({BACKUP_BASE_DIR})
Documentation=file://{DATA_MOUNT}/scripts/purge-docker-backups.sh
Requires=mnt-sbnb-data.mount
After=mnt-sbnb-data.mount
[Service]
Type=oneshot
ExecStart=/bin/sh {DATA_MOUNT}/scripts/purge-docker-backups.sh
"""
# Purge Timer
DOCKER_PURGE_TIMER_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-purge.timer
[Unit]
Description=Daily Docker Backup Purge Timer
Requires=docker-purge.service
[Timer]
OnCalendar=*-*-* 06:00:00
AccuracySec=1h
Persistent=true
RandomizedDelaySec=300 # 5 minutes
Unit=docker-purge.service
[Install]
WantedBy=timers.target
"""
# Shutdown Backup Service
DOCKER_SHUTDOWN_BACKUP_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-shutdown-backup.service
[Unit]
Description=Backup Docker Data ({PERSISTENT_DOCKER_ROOT}) on Shutdown (Best Effort)
Documentation=file://{DATA_MOUNT}/scripts/backup-docker.sh
DefaultDependencies=no # Crucial for shutdown units
Requires=mnt-sbnb-data.mount docker.service
After=mnt-sbnb-data.mount docker.service network.target
Before=shutdown.target reboot.target halt.target kexec.target umount.target final.target
[Service]
Type=oneshot
RemainAfterExit=true # Important for ExecStop= during shutdown
TimeoutStopSec=180 # Give backup reasonable time (3 minutes)
ExecStop=/bin/sh {DATA_MOUNT}/scripts/backup-docker.sh # Run backup on stop
[Install]
WantedBy=shutdown.target reboot.target halt.target kexec.target
"""
# Health Check Service
DOCKER_HEALTH_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-health-check.service
[Unit]
Description=Docker Health Check Service
Documentation=file://{DATA_MOUNT}/scripts/docker-health-check.sh
Requires=mnt-sbnb-data.mount docker.service
After=mnt-sbnb-data.mount docker.service
[Service]
Type=oneshot
ExecStart=/bin/sh {DATA_MOUNT}/scripts/docker-health-check.sh
# Optional resource limits
# CPUQuota=10%
# MemoryMax=128M
"""
# Health Check Timer
DOCKER_HEALTH_TIMER_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-health-check.timer
[Unit]
Description=Regular Docker Health Check Timer
Requires=docker-health-check.service
[Timer]
# Run 5 mins after boot, then every 15 mins
OnBootSec=5min
OnUnitActiveSec=15min
AccuracySec=1min
Unit=docker-health-check.service
[Install]
WantedBy=timers.target
"""
# Volume Check Service
DOCKER_VOLUME_SERVICE_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-volume-check.service
[Unit]
Description=Docker Volume Space Check Service ({PERSISTENT_DOCKER_ROOT})
Documentation=file://{DATA_MOUNT}/scripts/docker-volume-check.sh
Requires=mnt-sbnb-data.mount docker.service
After=mnt-sbnb-data.mount docker.service
[Service]
Type=oneshot
ExecStart=/bin/sh {DATA_MOUNT}/scripts/docker-volume-check.sh
# Optional resource limits
# CPUQuota=10%
# MemoryMax=64M
"""
# Volume Check Timer
DOCKER_VOLUME_TIMER_CONTENT = f"""# File: {DATA_MOUNT}/systemd/docker-volume-check.timer
[Unit]
Description=Regular Docker Volume Check Timer
Requires=docker-volume-check.service
[Timer]
# Run 10 mins after boot, then every hour
OnBootSec=10min
OnUnitActiveSec=1h
AccuracySec=5min
Unit=docker-volume-check.service
[Install]
WantedBy=timers.target
"""
# --- Dictionary of Files to Create ---
# Defines all files to be generated by this script
FILES_TO_CREATE = {
# --- ESP Files ---
f"{ESP_MOUNT}/sbnb-cmds.sh": {
"content": SBNB_CMDS_SH_CONTENT,
"permissions": 0o755 # rwxr-xr-x
},
f"{ESP_MOUNT}/sbnb-tskey.txt": {
"content": SBNB_TSKEY_TXT_CONTENT,
"permissions": 0o600 # rw------- (Restrict access to key)
},
# --- Data Partition Files ---
# Helper Scripts
f"{DATA_MOUNT}/scripts/backup-docker.sh": {
"content": BACKUP_DOCKER_SH_CONTENT,
"permissions": 0o750 # rwxr-x--- (Owner exec, group read/exec)
},
f"{DATA_MOUNT}/scripts/purge-docker-backups.sh": {
"content": PURGE_DOCKER_BACKUPS_SH_CONTENT,
"permissions": 0o750
},
f"{DATA_MOUNT}/scripts/docker-health-check.sh": {
"content": DOCKER_HEALTH_CHECK_SH_CONTENT,
"permissions": 0o750
},
f"{DATA_MOUNT}/scripts/docker-volume-check.sh": {
"content": DOCKER_VOLUME_CHECK_SH_CONTENT,
"permissions": 0o750
},
# Systemd Units
f"{DATA_MOUNT}/systemd/docker-backup.service": {
"content": DOCKER_BACKUP_SERVICE_CONTENT,
"permissions": 0o644 # rw-r--r-- (Standard systemd unit permissions)
},
f"{DATA_MOUNT}/systemd/docker-backup.timer": {
"content": DOCKER_BACKUP_TIMER_CONTENT,
"permissions": 0o644
},
f"{DATA_MOUNT}/systemd/docker-purge.service": {
"content": DOCKER_PURGE_SERVICE_CONTENT,
"permissions": 0o644
},
f"{DATA_MOUNT}/systemd/docker-purge.timer": {
"content": DOCKER_PURGE_TIMER_CONTENT,
"permissions": 0o644
},
f"{DATA_MOUNT}/systemd/docker-shutdown-backup.service": {
"content": DOCKER_SHUTDOWN_BACKUP_SERVICE_CONTENT,
"permissions": 0o644
},
f"{DATA_MOUNT}/systemd/docker-health-check.service": {
"content": DOCKER_HEALTH_SERVICE_CONTENT,
"permissions": 0o644
},
f"{DATA_MOUNT}/systemd/docker-health-check.timer": {
"content": DOCKER_HEALTH_TIMER_CONTENT,
"permissions": 0o644
},
f"{DATA_MOUNT}/systemd/docker-volume-check.service": {
"content": DOCKER_VOLUME_SERVICE_CONTENT,
"permissions": 0o644
},
f"{DATA_MOUNT}/systemd/docker-volume-check.timer": {
"content": DOCKER_VOLUME_TIMER_CONTENT,
"permissions": 0o644
},
}
# --- Global counters for create_files status ---
warning_count = 0
fail_count = 0
# --- Main Script Logic ---
def check_prerequisites():
"""Verify script prerequisites before attempting file creation."""
print("--- Checking Prerequisites ---")
passed = True
# 1. Check root privileges
if os.geteuid() != 0:
print("ERROR: Script must be run as root (UID 0).")
passed = False
else:
print("OK: Running as root.")
# 2. Check base mount points exist and are writable
base_dirs = {ESP_MOUNT: "ESP", DATA_MOUNT: "Data"}
for bdir, name in base_dirs.items():
bdir_path = pathlib.Path(bdir)
print(f"Checking {name} mount point: {bdir}...")
if not bdir_path.is_dir():
print(f"ERROR: Base {name} directory '{bdir}' does not exist or is not a directory.")
print(f" Please ensure the corresponding partition is mounted correctly before running.")
passed = False
elif not os.access(bdir_path, os.W_OK):
print(f"ERROR: Base {name} directory '{bdir}' is not writable by the current user (root). Check mount options or permissions.")
passed = False
else:
print(f"OK: Base {name} directory '{bdir}' exists and is writable.")
# 3. Check for optional but recommended commands needed by generated scripts
print("Checking for optional command (jq)...")
try:
if shutil.which("jq"):
print("OK: 'jq' command found (recommended for robust daemon.json handling).")
else:
print("WARNING: 'jq' command not found. Generated sbnb-cmds.sh will use less robust methods for daemon.json, which might fail or overwrite existing settings.")
# Removed rsync check as it's no longer used/preferred by the generated script
except ImportError:
print("WARNING: Python 'shutil' module not found, cannot check for optional command (jq).")
except Exception as e:
print(f"WARNING: Error checking for optional commands: {e}")
if not passed:
print("----------------------------")
print("ERROR: Prerequisites not met. Aborting script.")
sys.exit(1)
print("--- Prerequisites OK ---")
return True
def create_files():
"""Creates directories and files as defined in FILES_TO_CREATE."""
global warning_count, fail_count # Declare intent to modify globals
print("\n--- Starting File Creation Process ---")
success_count = 0
warning_count = 0 # Reset global counter
fail_count = 0 # Reset global counter
# Ensure the base backup directory exists first with correct permissions
try:
print(f"\nEnsuring base backup directory exists: {BACKUP_BASE_DIR}")
# Create directory with specific permissions (rwxr-x---)
os.makedirs(BACKUP_BASE_DIR, mode=BACKUP_DIR_PERMISSIONS, exist_ok=True)
# Explicitly set permissions in case it already existed with different ones
current_perm = stat.S_IMODE(os.stat(BACKUP_BASE_DIR).st_mode)
if current_perm != BACKUP_DIR_PERMISSIONS:
print(f" Adjusting permissions on {BACKUP_BASE_DIR} to {BACKUP_DIR_PERMISSIONS:o}...") # Use :o format
os.chmod(BACKUP_BASE_DIR, BACKUP_DIR_PERMISSIONS)
print(f"OK: Backup directory ensured: {BACKUP_BASE_DIR} with permissions {BACKUP_DIR_PERMISSIONS:o}") # Use :o format
except OSError as e:
print(f"ERROR: Failed to create or set permissions on {BACKUP_BASE_DIR}: {e}")
sys.exit(f"ERROR: Could not ensure backup directory '{BACKUP_BASE_DIR}'. Exiting.")
except Exception as e:
print(f"ERROR: An unexpected error occurred ensuring backup directory: {e}")
sys.exit(f"ERROR: Could not ensure backup directory '{BACKUP_BASE_DIR}'. Exiting.")
# Process the files dictionary
for file_path_str, details in FILES_TO_CREATE.items():
file_path = pathlib.Path(file_path_str)
write_succeeded = False # Flag to track if write was successful
try:
content = details.get("content") # Use get() as content might be None for dirs
permissions = details.get("permissions") # Use .get() for optional permissions
# Assign default permissions if not specified
if permissions is None:
if content is None: # It's meant to be a directory
permissions = 0o755 # Default rwxr-xr-x for directories
else: # It's a file
permissions = 0o644 # Default rw-r--r-- for files
print(f"INFO: No specific permission set for {file_path}, using default {permissions:o}.") # Use :o format
except KeyError as e:
print(f"\nERROR: Configuration error - Missing '{e}' key for entry {file_path_str}. Skipping.")
fail_count += 1
continue
except Exception as e:
print(f"\nERROR: Configuration error for {file_path_str}: {e}. Skipping.")
fail_count += 1
continue
print(f"\nProcessing: {file_path}")
# 1. Create parent directories robustly
try:
parent_dir = file_path.parent
# Check if parent needs creation (avoid os.makedirs on existing dirs if possible)
if not parent_dir.is_dir():
print(f" Creating parent directory: {parent_dir}")
# mode=0o755 sets default permissions for newly created dirs (rwxr-xr-x)
os.makedirs(parent_dir, mode=0o755, exist_ok=True)
# Explicitly set permissions on parent in case it was just created or exist_ok=True skipped it
print(f" Setting parent directory permissions to 755...") # 755 doesn't need 0o prefix
os.chmod(parent_dir, 0o755)
else:
# Parent exists, ensure it's writable and has correct permissions
print(f" Parent directory exists: {parent_dir}")
if not os.access(parent_dir, os.W_OK):
print(f" WARNING: Parent directory {parent_dir} is not writable! File write may fail.")
warning_count += 1
# Ensure existing parent has standard 755 permissions
try:
current_parent_perm = stat.S_IMODE(os.stat(parent_dir).st_mode)
if current_parent_perm != 0o755:
print(f" Ensuring parent directory permissions are 755 (currently {current_parent_perm:o})...") # Use :o format
os.chmod(parent_dir, 0o755)
except OSError as e:
print(f" WARNING: Could not check/set permissions on existing parent {parent_dir}: {e}")
warning_count += 1
except OSError as e:
print(f" ERROR: Failed to create or set permissions on parent directory {parent_dir}: {e}")
print(f" Skipping item: {file_path}")
fail_count += 1
continue # Skip to the next file
except Exception as e:
print(f" ERROR: An unexpected error occurred creating parent directory for {file_path}: {e}")
print(f" Skipping item: {file_path}")
fail_count += 1
continue
# 2. Write the file content (or create directory if content is None)
if content is not None: # It's a file
try:
print(f" Writing content...")
# Use write_text for atomic write where possible and UTF-8 encoding
file_path.write_text(content, encoding='utf-8')
print(f" Successfully wrote: {file_path}")
write_succeeded = True
except IOError as e:
print(f" ERROR: Failed to write file {file_path}: {e}")
fail_count += 1
continue # Skip permissions if write failed
except Exception as e:
print(f" ERROR: An unexpected error occurred writing {file_path}: {e}")
fail_count += 1
continue
else: # It's a directory (content is None)
try:
print(f" Ensuring directory exists: {file_path}")
os.makedirs(file_path, mode=permissions, exist_ok=True)
# Explicitly set permissions in case it already existed
os.chmod(file_path, permissions)
print(f" Successfully ensured directory: {file_path}")
write_succeeded = True # Treat dir success like file write success
except OSError as e:
print(f" ERROR: Failed to create/set permissions on directory {file_path}: {e}")
fail_count += 1
continue
except Exception as e:
print(f" ERROR: An unexpected error occurred ensuring directory {file_path}: {e}")
fail_count += 1
continue
# 3. Set permissions (only if write/dir creation succeeded)
if write_succeeded:
try:
# Check if current permissions match target permissions before attempting chmod
current_perm = stat.S_IMODE(os.stat(file_path).st_mode)
if current_perm != permissions:
print(f" Setting permissions to {permissions:o} (currently {current_perm:o})...") # Use :o format
os.chmod(file_path, permissions)
print(f" Successfully set permissions for: {file_path}")
else:
print(f" Permissions already set correctly ({permissions:o}) for: {file_path}") # Use :o format
success_count += 1 # Count full success (write/dir + chmod)
except OSError as e:
print(f" WARNING: Failed to set permissions on {file_path}: {e}")
warning_count += 1 # Item created/written, but permissions failed/check failed
except Exception as e:
print(f" WARNING: An unexpected error occurred setting permissions for {file_path}: {e}")
warning_count += 1
# --- Summary ---
print("\n--- File Creation Summary ---")
print(f"Successfully processed (created/permissioned): {success_count} items")
print(f"Items processed but with warnings: {warning_count}")
print(f"Failed operations (write/dir/parent): {fail_count}")
print("-------------------------------\n")
total_issues = fail_count + warning_count
if total_issues > 0:
print("NOTE: Some errors or warnings occurred during file creation.")
if fail_count > 0:
print("ERROR: Fatal errors occurred. Deployment incomplete.")
return False # Fatal errors occurred
else:
print("Deployment completed, but with warnings. Please review the output above.")
return True # Only non-fatal warnings
else:
print("SBNB configuration file deployment completed successfully.")
return True
# --- Script Execution ---
if __name__ == "__main__":
print("=====================================================================")
print(" SBNB Unified Configuration Deployment Script (v2.1 - BusyBox cp) ")
print("=====================================================================")
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Configuring Docker persistent root: {PERSISTENT_DOCKER_ROOT}")
print("Includes Backup/Purge and Health/Volume monitoring.")
print("Data migration uses 'cp -a -u' (BusyBox friendly).")
print("=====================================================================\n")
# Store counts for final status reporting
final_warning_count = 0
final_fail_count = 0
if check_prerequisites():
# Capture status from create_files
create_files_success = create_files()
# Access the global counters updated by create_files
final_warning_count = warning_count
final_fail_count = fail_count
if create_files_success or (final_fail_count == 0 and final_warning_count > 0) :
# Success or only warnings - print final instructions
print("\n!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
print("!!! CRITICAL: You MUST replace the placeholder in !!!")
print(f"!!! '{ESP_MOUNT}/sbnb-tskey.txt' with your actual Tailscale auth key! !!!")
print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
print("\n--- Next Steps ---")
print("1. Review any WARNINGS in the output above.")
print("2. Reboot the system for sbnb-cmds.sh to take effect.")
print("3. After reboot, verify Docker configuration and status:")
print(f" - Check data root: `docker info | grep 'Docker Root Dir'` (should show '{PERSISTENT_DOCKER_ROOT}')")
print(f" - Check status: `systemctl status docker.service`")
print(f" - Check boot script logs: `journalctl -t sbnb-cmds.sh --no-pager` or check `/dev/kmsg` output during boot")
print(f" - Check timers: `systemctl list-timers --all | grep docker`")
print(f" - Check helper script logs periodically: `journalctl -t backup-docker.sh -t purge-docker-backups.sh -t docker-health-check -t docker-volume-check --no-pager`")
if final_warning_count > 0:
print("\nDeployment finished with WARNINGS.")
sys.exit(2) # Exit code 2 for success with warnings
else:
print("\nDeployment finished successfully.")
sys.exit(0) # Exit successfully
else:
# Fatal errors occurred during file creation
print("\n--- Deployment Failed ---")
print("Fatal errors occurred during file creation. System configuration may be incomplete or inconsistent.")
sys.exit(1) # Exit with error code
- Unmount the EFI Partition:
echo "--- Unmounting ESP partition ---" # Ensure buffers are flushed before unmounting sync sudo umount /mnt/sbnb-mount
#Phase 4: Backing Up Data (CRITICAL!)
- Why Essential: High risk of USB drive failure. Backups are mandatory.
- Strategy: Automate regular backups of
/mnt/sbnb-data
. - File Data Backup (
rsync
): Ensure the backup destination (NAS, cloud, another server) has sufficient free space.# Example: From Sbnb to backup-server (requires ssh key auth) rsync -avz --delete --progress --human-readable /mnt/sbnb-data/ user@backup-server:/path/to/backups/sbnb-usb-data/
- Frequency: Daily recommended for active data.
- Automation: Use cron/systemd timers or remote triggers.
- Testing Restores: Vital! Don’t assume backups work.
- Conceptual Restore: Boot Linux Live env -> Mount backup source -> Mount target USB data partition (new/reformatted) to
/mnt/restore
->sudo rsync -av --progress /path/to/backup/sbnb-usb-data/ /mnt/restore/
-> Verify restored files (count, size, checksums, spot checks). - Verification: Use tools like
diff -r
,md5sum
, orsha256sum
to compare restored files against originals or known good copies. - Untested backups provide a false sense of security.
#Phase 5: Boot and Verify
- Safely Eject: Eject USB from prep system.
- Configure Server BIOS/UEFI: Enter setup (DEL, F2, F10, F12, etc.). Ensure UEFI Mode ON, CSM/Legacy OFF, Secure Boot OFF. Set “UEFI: USB…” as first boot device. Save & Exit.
- Boot Sbnb Linux.
- Verify Operation:
- Monitor Boot: Watch console for
sbnb-cmds.sh
logs, errors. - SSH into Sbnb.
- Check Mounts:
lsblk -o NAME,SIZE,TYPE,FSTYPE,LABEL,MOUNTPOINT # Look for mount at /mnt/sbnb-data df -hT | grep -E 'Filesystem|/mnt/sbnb-data' # Check usage/type mount | grep /mnt/sbnb-data # Check mount options (rw, noatime) findmnt /mnt/sbnb-data # Another way to check mount info
- Test Persistence:
```bash
#After SSHing in:
TIMESTAMP=$(date) echo “Sbnb USB Persistence test - $TIMESTAMP” | sudo tee /mnt/sbnb-data/persistence_test.txt > /dev/null sync && echo “Synced data to disk.” echo “File created. Content:” && sudo cat /mnt/sbnb-data/persistence_test.txt echo “Rebooting server now…” && sudo reboot
#— Wait for reboot and reconnect via SSH —
echo “Checking for file after reboot…” if [ -f /mnt/sbnb-data/persistence_test.txt ]; then echo “SUCCESS: File found. Content:” && sudo cat /mnt/sbnb-data/persistence_test.txt sudo rm /mnt/sbnb-data/persistence_test.txt # Clean up else echo “FAILURE: File NOT FOUND after reboot! Persistence failed.” fi ```
- Monitor Boot: Watch console for
#Troubleshooting
- Doesn’t Boot / No Bootable Device:
- Re-verify BIOS settings (UEFI, Secure Boot OFF, Boot Order).
- Re-verify USB Prep: Partitions (
parted print
), ESP flags (boot
,esp
), ESP filesystem label (blkid /dev/sdX1
->LABEL="sbnb"
), EFI file path (/EFI/BOOT/BOOTX64.EFI
). - Try different USB ports (check if port provides sufficient power). Test drive health on prep machine (
fsck
,badblocks -nvs /dev/sdX
). Recreate drive meticulously.
- Data Partition Not Mounted /
/mnt/sbnb-data
Empty:- Check boot logs (
journalctl -b
, console) forsbnb-cmds.sh
errors (“Device… not found”, “Failed to mount”). Checkdmesg
for USB errors (dmesg | grep -iE 'usb|sdX'
) or filesystem errors (dmesg | grep -i ext4
). - SSH in:
- Verify partition & label:
sudo blkid
,ls -l /dev/disk/by-label/
. IsSBNB_DATA
present? Does it point to the correct device? - If label wrong/missing: Re-label from prep env (
sudo e2label /dev/sdX2 SBNB_DATA
). - If device/label exists, try manual mount:
sudo mkdir -p /mnt/sbnb-data && sudo mount /dev/disk/by-label/SBNB_DATA /mnt/sbnb-data
. Checkdmesg
for errors (e.g.,mount: wrong fs type, bad option, bad superblock
). If manual mount works, debugsbnb-cmds.sh
(addset -x
, check paths, loop duration, check script permissionsls -l /mnt/sbnb/sbnb-cmds.sh
). - Run filesystem check (unmounted):
sudo e2fsck -f /dev/disk/by-label/SBNB_DATA
. - Check kernel modules:
lsmod | grep ext4
. Is the module loaded? Checkdmesg
for errors loading filesystem modules.
- Verify partition & label:
- Check boot logs (
- Poor Performance / Drive Failure:
- Performance: Inherent limitation.
- Lifespan/Failure: Monitor
dmesg
for I/O errors. Restore from verified backups upon failure. This setup will wear out consumer flash drives with persistent writes.