Practical Scripting - Automated Backups

Overview

I like my homelab. It’s quiet, power-efficient, and handles a bunch of services around my home network that I can no longer envision life without. However, I spend most of my time at my workstation *cough gaming* PC, which has much more powerful hardware than the server itself. So when I finished configuring my NAS (Network-Attached-Storage, for those unfamiliar), the next step was backing up all of my important files for redundancy. I do most of my coding and management from this system, and I already back these documents up manually to external (off-site) storage. While off-site hard-copy backups are very secure, no single backup is 100% bulletproof by itself. Now but had only one other copy on-site until the lab was up and running.

There are two steps involved here:

Write a script to copy files from my PC to the server

>>*(preferably, only 'new' files)

Automate the script on a scheduled basis (based on how often the dataset changes)

Let's start with the script

I run Windows 11 on my PC, so this script is going to be written for PowerShell. There are a few commands available for copying/moving data en masse using PS (and even the "regular" Windows command line), and after some research I landed on robocopy as my tool of choice. There is nothing inherently wrong with any of the other tools; I found the options to exclude old files within robocopy to be the most applicable for my needs.

As for the actual copying; that step is very easy. Since robocopy is a Windows commandlet, we have plenty of documentation to reference for building our copy command at Microsoft's own website. I plan to use a few switches to filter what exactly we plan to copy (and what we plan to log):

- /s (copy any sub-directories inside selected folder)

- /e (includes any empty directories)

- /b (overwrites Access Control Log settings for files, in case I need to recover them with different credentials)**

- /z (runs in 'restart-able' mode - if connection breaks, command will pause and re-start upon connection to network)

- /xo (excludes any old files that already exist at the destination)

- /ts (includes timestamp for source file in the output)

- /tee (writes status of the output to the console window) <- we will pipe this into a log file for later review

- /np (excludes progress status from log files)

- /log+: (designates the filepath for the log itself)

#**{{if you pull this script for any sort of production or enterprise environment, I strongly encourage you to discard this switch - it will make any data transferred to the destination "public" from a file permission standpoint.}}

That's a lot of switches, but nowhere near all of them. If we look at the outcome of stacking these switches in the frame of "how to process this dataset", it may seem more straightforward. We are going to copy (robocopy) "This Folder" and any sub-directories (/s) or empty sub-directories (/e) inside of it. We will make these copies readable without credential locks (/b) just in case we have to access the data without the pre-existing ACL. Since we're transferring these over a network, we want to make sure the operation pauses during any outages (/z). To avoid redundancy of data, and to save time, we only want to transfer "newer" files than the ones on the server (/xo).

Now that we have covered the transfer itself, the rest of our switches pertain to our log files. Without logs, how could we know that the files all copied successfully (besides checking both locations one at a time). So for our log, we want to include timestamps to see when each file moved over, or when some error occurred (/ts). We need the command to still output text to the "console", even though we will not be reading it at the time the script runs. So we will force it to "print" the output (/tee), which we can redirect from the "console window" to our log file. We don't need all of robocopy's output text, such as percentages and loading bars, so we exclude those from the log (/np). Lastly, we want to designate a special location for our log (/log+:).

If you plan to copy this script for your own use, I strongly recommend testing on some non-critical data on the same drive. There are comments in the script itself indicating which variables need to be populated with your data. Play around with adding/removing files, and when you are confident it will achieve your needs, you can replace the 'source' and 'destination' folders with your "production" filepaths.

On Log Files...

I am nuts about formatting data (especially text data) into as legible a format as I can muster. With that in mind, this script includes some formatting steps (and failsafe checkpoints) to ensure that our log files populate correctly *every time*. The first step is to clear old data from our file. I do this by overwriting the file with a single line of text. From there, the code itself is simple, and can be "copy/pasted" for as many directories as we would like to back up (remember to either use direct filepaths or set variables for your additional entries:

robocopy $Source1 $Dest1 /s /e /b /z /xo /ts /tee /np /log+:$LogLocal

I use variable placeholders for full filepaths here to ease portability. The full script is available in my PowerShell Github repository*, which includes helpful comment guides for porting this script to your own environment. The $Source varaible contains the filepath for your source folder, and the $Dest variable contains the filepath for your destination folder (either local or remote**). The $LogLocal file contains the full text output from our command for future review (if necessary).

**Please note - this is a PowerShell script written to run in a Windows environment, so be careful with your filepath syntax, especially if you bounce between Windows & Linux. It's always important to get your slash directions right! Even though my destination server runs Ubuntu, I've used Windows filepath syntax for this script since it will run on my Windows machine.

Summarizng the Important Stuff

While having access to the full log information is nice, very rarely will we need to read or parse through it all manually. Instead, let's create a second log with more of what we *want to know* and less of what we *don't*. What do we want to know? We’ll start with how many files were added, and a confirmation of what day & time the action completed.

$NewXfers = Get-Content $LogLocal -ReadCount 1000 | foreach {$_ -match "New File"}

$FileCount = $NewXfers.count

$date= date

To count our files, we'll grab only text lines from the output (our log file) that include the words "New File" (the -ReadCount option tells the system to check 1,000 lines at a time, instead of line-by-line, to save on comput time for particularly long logs). We'll then set up a variable to hold the number of entries our search command finds, and another for the date.

From here, we simply write up a quick log file with all the data we want to know:

echo "======================================================================" > $ShortLog

echo '----------------------- S U M M A R Y --------------------------------' >> $ShortLog

echo "======================================================================`r`n" >> $ShortLog

echo "Total files added to archives: $FileCount" >> $ShortLog

echo " " >> $ShortLog

echo "Names of files added to archives: " >> $ShortLog

echo $NewXfers >> $ShortLog

echo "`r`n`- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -" >> $ShortLog

echo "`r`nMost Recent Backup completed as of: $date" >> $ShortLog

You may notice that this whole section just prints out information to a text file. That's because it is! The hard work of our script finished a while ago (back up at the robocopy command itself). All we are doing now is summarizing the results into a digestable format any admin can check (or feed into yet another script for further interpretation).

Automating The Job

This is actually the easiest step in the procedure - Windows has a built-in tool for scheduling tasks, called the Task Scheduler! Running the script on a recurring basis is fairly straightforward, and there are tons of guides online for how to interact with Task Scheduler. For a basic breakdown, we are going to select "Create New Task", name it something we can identify, and set up the details:

the "Trigger" will be a time of day, and will occur every day at that same time

the "Action" will be to run a program (Powershell), and we will add a command for it to pass through to Powershell in the "add arguments" section:

File "C:\path\to\the\script.ps1"; sleep 10; exit 0

From here, we hit "save", enter administrator-level credentials, and boom! That's it! On the days/times we indicated in setup, Windows will launch Powershell and pass it our command to:

- Run our script; Wait 10 seconds after completion; Close PowerShell

If we want to test our script, we can right-click the task and click "run" to force it to run right now. It will still run itself at the scheduled time until we either change the details or delete the task.

Summary

There are other small bits & bobs in this code I did not cover (such as the redundant log copy stored on the destination device, which is a combination of both log files in a single list). Had I gone over every line of code for this one, the post would be several miles long. If you have any questions, or would like to see the full script, you can find it here in my PowerShell repository on Github. Please reach out via the "contact" section with any questions. If you are still reading this far, thank you for your time and I hope this was helpful to you in some way.

JORT

Tinkerer, Linux enthusiast, data hoarder, dungeon master, cat parent, and learner of things.

Previous
Previous

Practical Scripting - Leveraging bash to organize text notes

Next
Next

Scripting Fundamentals - Learning if/then in bash