Migration Script for plusserver S3 Service

5 minute read

This short guide will walk you through the process of using the rclone tool for transferring data between different storage destinations. Rclone is a powerful command-line tool that enables seamless file and directory transfers between various cloud storage services.

Info

Before proceeding with this documentation, please ensure that the rclone configuration has been completed as per the instructions in Data Migration with rclone Step 1 and Step 2. This is necessary to smoothly proceed with the described process.

Automated Data Transfer with the Migration Script (using Bash)

In this step, you will learn how to use the migration script on a server to perform automated data transfers between different cloud storage services.

Preparation

Download the Script: Download the migration script to the server where you intend to perform the data transfer.

Make it Executable: Grant executable rights to the script so that it can be executed. Use the command chmod u+x plusserver_transfer_script.sh for this purpose.

Using the Script:

The migration script supports various parameters that allow you to customize the data transfer according to your requirements. Here are the steps to use the script:

Display Help: Execute the command ./plusserver_transfer_script.sh -h or ./plusserver_transfer_script.sh --help to get an overview of the supported parameters and their meanings.
Prepare CSV File: Create a CSV file with the necessary information about source and destination storage profiles and their corresponding bucket names. The schema of the CSV file should look like this:
source_profile source_bucket dest_profile dest_bucket
old_s3 bucket1 new_s3 bucketA
old_s3 bucket2 new_s3 bucketB
(Example of a CSV file with source and destination combinations)
Ideally, name this file buckets.csv. If you choose a different name, you need to specify it using a parameter when running the script.
Start Data Transfer: Execute the command ./plusserver_transfer_script.sh -f CSV_FILE to start the data transfer. Replace CSV_FILE with the path to the prepared CSV file. The script will perform the transfer between the specified storage targets. If you run the script without the -f parameter, it will default to using buckets.csv.
Optional Parameters:
-d or –delimiter: Sets the delimiter character for the CSV file. Per default , is used.
–delete: Uses the rclone sync operation. When the script is used with the –delete option, objects that were deleted in the source will also be deleted in the destination bucket.
Example:
Assuming you have placed the script on your server at /home/user/plusserver_transfer_script.sh and prepared a CSV file named transfer_data.csv in the same directory. You want to start the data transfer using the rclone sync operation with the --delete option.
Execute the following command:
```
./plusserver_transfer_script.sh -f transfer_data.csv --delete
```

source_profile	source_bucket	dest_profile	dest_bucket
old_s3	bucket1	new_s3	bucketA
old_s3	bucket2	new_s3	bucketB

Info

Upon re-execution of the script, only newly added objects between the source and destination buckets will be transferred. If files were deleted in the source buckets, they will remain in the destination bucket unless you use the –delete option.

Migration Script:

#!/bin/bash

# Function to display the help
display_help() {
    echo "Usage: transfer_script.sh [OPTIONS]"
    echo "Transfer data from source to destination using rclone."
    echo ""
    echo "Options:"
    echo "  -h, --help              Display this help message and exit."
    echo "  -f, --file FILE         Use the specified CSV file as input."
    echo "  -d, --delimiter 'DELIM' Set the delimiter character for the CSV file."
    echo "      --delete            Use 'rclone sync' with '--delete' option."
    echo ""
    echo "By default, the script uses ',' as the delimiter and 'buckets.csv' as the CSV file."
}

# Default values for options
csv_file="buckets.csv"
delimiter=","
use_delete=false

# Processing command line options
while [[ $# -gt 0 ]]; do
    case $1 in
        -h|--help)
            display_help
            exit 0
            ;;
        -f|--file)
            csv_file="$2"
            shift 2
            ;;
        -d|--delimiter)
            delimiter="$2"
            shift 2
            ;;
        --delete)
            use_delete=true
            shift
            ;;
        *)
            echo "Error: Unknown option $1"
            display_help
            exit 1
            ;;
    esac
done

# Add timestamp to log file
log_file="sync_log_$(date -Iminutes).txt"

# Check if the CSV file exists
if [ -f "$csv_file" ]; then
    # Skip the header line in the CSV file
    read -r header < "$csv_file"

    # Initialize an array for unique profiles
    unique_profiles=()

    # Get output from rclone listremotes and store in a variable
    all_profiles=$(rclone listremotes)

    # Read CSV file and check profiles
    while IFS="$delimiter" read -r source_profile source_bucket dest_profile dest_bucket; do
        # Store unique profiles
        if [[ ! " ${unique_profiles[@]} " =~ " $source_profile " ]]; then
            unique_profiles+=("$source_profile")
        fi
        if [[ ! " ${unique_profiles[@]} " =~ " $dest_profile " ]]; then
            unique_profiles+=("$dest_profile")
        fi
    done < <(tail -n +2 "$csv_file") # Process all lines except the first (header)

    # Check if the unique profiles exist in the rclone configuration
    missing_profiles=()
    for profile in "${unique_profiles[@]}"; do
        if ! grep -q "$profile" <<< "$all_profiles"; then
            missing_profiles+=("$profile")
        fi
    done

    # Display error message if profiles are missing
    if [ ${#missing_profiles[@]} -gt 0 ]; then
        echo "Error: The following profiles are missing in rclone configuration:"
        for missing_profile in "${missing_profiles[@]}"; do
            echo "  - $missing_profile"
        done
        exit 1
    fi

    # Query buckets for each profile
    declare -A profile_buckets
    for profile in "${unique_profiles[@]}"; do
        buckets=$(rclone lsd "$profile:" | awk '{print $5}')
        profile_buckets["$profile"]=$buckets
    done

    # Check if the specified buckets in the CSV file are present
    missing_buckets=()
    while IFS="$delimiter" read -r source_profile source_bucket dest_profile dest_bucket; do
        if ! grep -q "$source_bucket" <<< "${profile_buckets["$source_profile"]}"; then
            missing_buckets+=("$source_profile:$source_bucket")
        fi
        if ! grep -q "$dest_bucket" <<< "${profile_buckets["$dest_profile"]}"; then
            missing_buckets+=("$dest_profile:$dest_bucket")
        fi
    done < <(tail -n +2 "$csv_file") # Process all lines except the first (header)

    # Display error message if buckets are missing
    if [ ${#missing_buckets[@]} -gt 0 ]; then
        echo "Error: The following buckets are missing or inaccessible:"
        echo "<Profile>:<Bucket>"
        for missing_bucket in "${missing_buckets[@]}"; do
            echo "  - $missing_bucket"
        done
        exit 1
    fi

    # Use rclone copy or rclone sync with --delete
    copy_command="rclone copy"
    if [ "$use_delete" = true ]; then
        copy_command="rclone sync"
    fi

    # Copy/synchronize objects from source bucket to destination bucket
    while IFS="$delimiter" read -r source_profile source_bucket dest_profile dest_bucket; do
        echo "Processing: $source_profile - $source_bucket -> $dest_profile - $dest_bucket"
        $copy_command "$source_profile":"$source_bucket" "$dest_profile":"$dest_bucket" --metadata --checksum --progress --copy-links --no-update-modtime --log-file="$log_file"
        echo "Processing completed:  $source_profile - $source_bucket -> $dest_profile - $dest_bucket"
    done < <(tail -n +2 "$csv_file") # Process all lines except the first (header)

    echo "All bucket transfers completed."
else
    echo "Error: CSV file \"$csv_file\" doesn't exist."
fi

Last modified 06.03.2025: manually fix regressions introduced by reformatting (ceda776)