Files
calypso/docs/bacula-vtl-troubleshooting.md
2025-12-31 03:04:11 +07:00

8.3 KiB

Bacula VTL Integration - Root Cause Analysis & Troubleshooting

Issue Summary

Bacula Storage Daemon was unable to read slots from mhVTL (Virtual Tape Library) autochanger devices, reporting "Device has 0 slots" despite mtx-changer script working correctly when called manually.

Environment

  • OS: Ubuntu Linux
  • Bacula Version: 13.0.4
  • VTL: mhVTL (Virtual Tape Library)
  • Autochangers:
    • Quantum Scalar i500 (4 drives, 43 slots)
    • Quantum Scalar i40 (4 drives, 44 slots)
  • Tape Drives: 8x QUANTUM ULTRIUM-HH8 (LTO-8)

Root Cause Analysis

Primary Issues Identified

1. Incorrect Tape Device Type

Problem: Using rewinding tape devices (/dev/st*) instead of non-rewinding devices (/dev/nst*)

Impact: Tape would rewind after each operation, causing data loss and operational failures

Solution: Changed all Archive Device directives from /dev/st* to /dev/nst*

Device {
  Name = Drive-0
- Archive Device = /dev/st0
+ Archive Device = /dev/nst0
}

2. Missing Drive Index Parameter

Problem: Device configurations lacked Drive Index parameter

Impact: Bacula couldn't properly identify which physical drive in the autochanger to use

Solution: Added Drive Index (0-3) to each Device resource

Device {
  Name = Drive-0
+ Drive Index = 0
  Archive Device = /dev/nst0
}

3. Incorrect AlwaysOpen Setting

Problem: AlwaysOpen was set to no

Impact: Device wouldn't remain open, causing connection issues with VTL

Solution: Changed AlwaysOpen to yes for all tape devices

Device {
  Name = Drive-0
- AlwaysOpen = no
+ AlwaysOpen = yes
}

4. Wrong Changer Device Path

Problem: Using /dev/sch* (medium changer device) instead of /dev/sg* (generic SCSI device)

Impact: bacula user couldn't access the changer due to permission issues (cdrom group vs tape group)

Solution: Changed Changer Device to use sg devices

Autochanger {
  Name = Scalar-i500
- Changer Device = /dev/sch0
+ Changer Device = /dev/sg7
}

Device Mapping:

  • /dev/sch0/dev/sg7 (Scalar i500)
  • /dev/sch1/dev/sg8 (Scalar i40)

5. Missing User Permissions

Problem: bacula user not in required groups for device access

Impact: "Permission denied" errors when accessing tape and changer devices

Solution: Added bacula user to tape and cdrom groups

usermod -a -G tape,cdrom bacula
systemctl restart bacula-sd

6. Incorrect Storage Resource Configuration

Problem: Storage resource in Director config referenced autochanger name instead of individual drives

Impact: Bacula couldn't properly communicate with individual tape drives

Solution: Listed all drives explicitly in Storage resource

Storage {
  Name = Scalar-i500
- Device = Scalar-i500
+ Device = Drive-0
+ Device = Drive-1
+ Device = Drive-2
+ Device = Drive-3
  Autochanger = Scalar-i500
}

7. mtx-changer List Output Format

Problem: Script output format didn't match Bacula's expected format

Impact: "Invalid Slot number" errors, preventing volume labeling

Original Output: 1 Full:VolumeTag=E01001L8 Expected Output: 1:E01001L8

Solution: Fixed sed pattern in list command

# Original (incorrect)
list)
  ${MTX} -f $ctl status | grep "Storage Element" | grep "Full" | awk '{print $3 $4}' | sed 's/:/ /'
  ;;

# Fixed
list)
  ${MTX} -f $ctl status | grep "Storage Element" | grep "Full" | awk '{print $3 $4}' | sed 's/:Full:VolumeTag=/:/'
  ;;

Troubleshooting Steps

Step 1: Verify mtx-changer Script Works Manually

# Test slots command
/usr/lib/bacula/scripts/mtx-changer /dev/sg7 slots
# Expected output: 43

# Test list command
/usr/lib/bacula/scripts/mtx-changer /dev/sg7 list
# Expected output: 1:E01001L8, 2:E01002L8, etc.

Step 2: Test as bacula User

# Test if bacula user can access devices
su -s /bin/bash bacula -c "/usr/lib/bacula/scripts/mtx-changer /dev/sg7 slots"

# If permission denied, check groups
groups bacula
# Should include: bacula tape cdrom

Step 3: Verify Device Permissions

# Check changer devices
ls -l /dev/sch* /dev/sg7 /dev/sg8
# sg devices should be in tape group

# Check tape devices
ls -l /dev/nst*
# Should be in tape group with rw permissions

Step 4: Test Bacula Storage Daemon Connection

# From bconsole
echo "status storage=Scalar-i500" | bconsole

# Should show autochanger and drives

Step 5: Update Slots

echo -e "update slots storage=Scalar-i500\n0\n" | bconsole

# Should show: Device "Drive-0" has 43 slots
# NOT: Device has 0 slots

Step 6: Label Tapes

echo -e "label barcodes storage=Scalar-i500 pool=Default\n0\nyes\n" | bconsole

# Should successfully label tapes using barcodes

Configuration Files

/etc/bacula/bacula-sd.conf (Storage Daemon)

Autochanger {
  Name = Scalar-i500
  Device = Drive-0, Drive-1, Drive-2, Drive-3
  Changer Command = "/usr/lib/bacula/scripts/mtx-changer %c %o %S %a %d"
  Changer Device = /dev/sg7
}

Device {
  Name = Drive-0
  Drive Index = 0
  Changer Device = /dev/sg7
  Media Type = LTO-8
  Archive Device = /dev/nst0
  AutomaticMount = yes
  AlwaysOpen = yes
  RemovableMedia = yes
  RandomAccess = no
  AutoChanger = yes
  Maximum Concurrent Jobs = 1
}

/etc/bacula/bacula-dir.conf (Director)

Storage {
  Name = Scalar-i500
  Address = localhost
  SDPort = 9103
  Password = "QJQPnZ5Q5p6D73RcvR7ksrOm9UG3mAhvV"
  Device = Drive-0
  Device = Drive-1
  Device = Drive-2
  Device = Drive-3
  Media Type = LTO-8
  Autochanger = Scalar-i500
  Maximum Concurrent Jobs = 4
}

/usr/lib/bacula/scripts/mtx-changer

#!/bin/sh
MTX=/usr/sbin/mtx

ctl=$1
cmd="$2"
slot=$3
device=$4
drive=$5

case "$cmd" in
   loaded)
      ${MTX} -f $ctl status | grep "Data Transfer Element $slot:Full" >/dev/null 2>&1
      if [ $? -eq 0 ]; then
         ${MTX} -f $ctl status | grep "Data Transfer Element $slot:Full" | awk '{print $7}' | sed 's/.*=//'
      else
         echo "0"
      fi
      ;;

   load)
      ${MTX} -f $ctl load $slot $drive
      ;;

   unload)
      ${MTX} -f $ctl unload $slot $drive
      ;;

   list)
      ${MTX} -f $ctl status | grep "Storage Element" | grep "Full" | awk '{print $3 $4}' | sed 's/:Full:VolumeTag=/:/'
      ;;

   slots)
      ${MTX} -f $ctl status | grep "Storage Changer" | awk '{print $5}'
      ;;

   *)
      echo "Invalid command: $cmd"
      exit 1
      ;;
esac

exit 0

Verification Commands

Check Device Mapping

lsscsi -g | grep -E "mediumx|tape"

Check VTL Services

systemctl list-units 'vtl*'

Test Manual Tape Load

# Load tape to drive
mtx -f /dev/sg7 load 1 0

# Check drive status
mt -f /dev/nst0 status

# Unload tape
mtx -f /dev/sg7 unload 1 0

List Labeled Volumes

echo "list volumes pool=Default" | bconsole

Common Errors and Solutions

Error: "Device has 0 slots"

Cause: Wrong changer device or permission issues Solution: Use /dev/sg* devices and verify bacula user in tape/cdrom groups

Error: "Permission denied" accessing /dev/sch0

Cause: bacula user not in cdrom group Solution: usermod -a -G cdrom bacula && systemctl restart bacula-sd

Error: "Invalid Slot number"

Cause: mtx-changer list output format incorrect Solution: Fix sed pattern to output slot:volumetag format

Error: "No medium found" after successful load

Cause: Using rewinding devices (/dev/st*) or AlwaysOpen=no Solution: Use /dev/nst* and set AlwaysOpen=yes

Error: "READ ELEMENT STATUS Command Failed"

Cause: Permission issue or VTL service problem Solution: Check user permissions and restart vtllibrary service

Results

Scalar i500 (WORKING)

  • 43 slots detected
  • 20 tapes successfully labeled (E01001L8 - E01020L8)
  • Autochanger operations functional
  • Ready for backup jobs

Scalar i40 (ISSUE)

  • ⚠️ 44 slots detected
  • Hardware Error during tape load operations
  • 0 tapes labeled
  • Status: Requires mhVTL configuration investigation or system restart

References

Date

Created: 2025-12-31 Author: Warp AI Agent