Automating Nightly Local Database Refreshes from Azure Blob Storage with Docker
Background
In cloud-hosted applications, it is common to restrict access to production databases. This is a good practice, but it can make it difficult for various teams to access the data they need for development, testing, reporting, and data analysis. One way to solve this problem is to create a daily process that copies the production database to a location that is accessible to the teams that need it. In this article, we will create a docker container that will download the latest backup of a production database from Azure Blob Storage and restore it to a local SQL Server instance.
Prerequisites
This article assumes that you have already set up an automation to back up your production database to Azure Blob Storage. If you have not done this, you can follow the instructions in this article: Automating Nightly Database Backups to Azure Blob Storage. The script in the article is reaching end of life, and an updated version can be found here.
Docker must be installed and configured to used Linux containers on your local machine. If you do not have Docker installed, you can download it from Docker's website.
Setting up the Docker Container
The first step is to create a Dockerfile that will be used to build the container. The Dockerfile will contain the instructions on how to configure the container and what commands to run when the container is started. For our environment, we will require Microsoft SQL Server to be installed in the container. We will use the official Microsoft SQL Server Docker image as a base.
Base Dockerfile
Create a new directory on your local machine and create a file called Dockerfile
in the directory. Add the following content to the file:
FROM mcr.microsoft.com/mssql/server:2022-CU11-ubuntu-22.04
# Set environment variables for the container
ARG ACCOUNT_NAME
ENV ACCOUNT_NAME=$ACCOUNT_NAME
ARG ACCOUNT_KEY
ENV ACCOUNT_KEY=$ACCOUNT_KEY
ARG CONTAINER_NAME
ENV CONTAINER_NAME=$CONTAINER_NAME
ARG CRON_SCHEDULE="0 4 * * *"
ENV CRON_SCHEDULE=$CRON_SCHEDULE
ARG DATABASE_NAME=MyDatabase
ENV DATABASE_NAME=$DATABASE_NAME
ENV MSSQL_SA_PASSWORD=yourStrong(!)Password
ENV ACCEPT_EULA=Y
ENV MSSQL_PID=Developer
# Create a working directory for our tools and scripts and copy all the files from the host machine to the container
COPY . /sql
WORKDIR /sql
Install Azure CLI
To find and download the latest backup of the production database, we will need to install the Azure CLI in the container. We'll also need wget, cron, unzip, and a few other utilities to help us automate the process. Microsoft does offer a script to install the Azure CLI, but in my testing it did not work as expected in the Docker container. Instead, we will use the manual installation instructions.
Update the Dockerfile
to include the following lines after the WORKDIR
line:
# Must be root to install packages
USER root
# Install Dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
unzip cron wget apt-transport-https \
software-properties-common ca-certificates curl \
apt-transport-https lsb-release gnupg && \
rm -rf /var/lib/apt/lists/*
# Install az cli
RUN mkdir -p /etc/apt/keyrings && \
curl -sLS https://packages.microsoft.com/keys/microsoft.asc | \
gpg --dearmor | \
tee /etc/apt/keyrings/microsoft.gpg > /dev/null && \
chmod go+r /etc/apt/keyrings/microsoft.gpg && \
AZ_DIST=$(lsb_release -cs) && \
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/microsoft.gpg] https://packages.microsoft.com/repos/azure-cli/ $AZ_DIST main" | \
tee /etc/apt/sources.list.d/azure-cli.list && \
apt-get update && \
apt-get install azure-cli && \
rm -rf /var/lib/apt/lists/*
Install SqlPackage
To import the backup bacpac file that we're going to download from Azure Blob Storage, we will need to install the sqlpackage
utility.
This utility is used to import and export bacpac files to and from SQL Server. The utility has evergreen links
available, so we can use the link to download the latest version of the utility.
Add the following lines to the end of the Dockerfile
:
# Install SQLPackage for Linux and make it executable
RUN wget -q -O sqlpackage.zip https://aka.ms/sqlpackage-linux \
&& unzip -qq sqlpackage.zip -d /sql/sqlpackage \
&& chmod +x /sql/sqlpackage/sqlpackage \
&& rm sqlpackage.zip
The above lines download the latest version of the sqlpackage
utility from Microsoft's website, unzip it, make it executable, and then remove the zip file.
Entrypoint
Finally, we need to add our entrypoint script to the container. This script will be run when the container starts, and it will perform all the necessary steps to download the latest backup from Azure Blob Storage and restore it to the local SQL Server instance.
Add the following lines to the end of the Dockerfile
:
# Switch back to mssql user
USER mssql
EXPOSE 1433
CMD /bin/bash ./entrypoint.sh
We need to switch back to the mssql
user to run the SQL Server process, so we make use of the USER
command to do this.
The EXPOSE
command tells Docker that the container listens on the specified network ports at runtime.
The CMD
command specifies the command that will be run when the container starts. In this case, we are running the entrypoint.sh
script.
We will make some final changes to our Dockerfile
later, but for now, save the file and close it.
The Scripts
As we alluded to at the end of our Dockerfile
, we need to create an entrypoint.sh
script that will be run when the container starts.
Since this container is based on the official Microsoft SQL Server Docker image, we need to ensure the original entrypoint is also run
alongside our custom entrypoint script. To do this, we need to create an additional script that we will call in our entrypoint.sh
script.
entrypoint.sh
Create two new files in the same directory as your Dockerfile
called entrypoint.sh
and initialize-database-and-jobs.sh
.
Add the following content to entrypoint.sh
:
#!/bin/bash
/sql/initialize-database-and-jobs.sh & /opt/mssql/bin/sqlservr
You'll note that we are running the initialize-database-and-jobs.sh
script in the background and then starting the SQL Server process.
This &
syntax is necessary to ensure that the original entrypoint script is also run without the docker container exiting immediately after
the script completes.
initialize-database-and-jobs.sh
Add the following content to initialize-database-and-jobs.sh
:
#!/bin/bash
# wait 30 seconds for SQL Server to start up
echo "Waiting for SQL Server to start"
sleep 30s
# Download the bacpac file from the Azure Blob Storage
echo "Downloading bacpac file from Azure Blob Storage"
bash /sql/download-latest.sh $ACCOUNT_NAME $ACCOUNT_KEY $CONTAINER_NAME /sql/backup.bacpac
backupJob=$?
if [ "$backupJob" -eq 0 ]
then
echo "Successfully downloaded bacpac file from Azure Blob Storage!"
echo "Enabling SQL Server authentication..."
/opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P $MSSQL_SA_PASSWORD -d master -i /sql/enable-authentication.sql
echo "SQL Server authentication enabled. Waiting for 10 seconds before importing the bacpac file..."
sleep 10s
# Import the bacpac file into the SQL Server
/sql/sqlpackage/sqlpackage /a:import /sf:/sql/backup.bacpac /tsn:localhost,1433 /tdn:$DATABASE_NAME /tu:sa /tp:$MSSQL_SA_PASSWORD /ttsc:True
# Set up 4am CRON job to re-import the database
echo "$CRON_SCHEDULE /bin/bash /sql/reimport-database.sh" | crontab -
echo "CRON job set up successfully"
exit 0
else
echo "Failed to download bacpac file from Azure Blob Storage"
exit 1
fi
This short script does quite a few operations. First, it's recommended to wait for SQL Server to start up before attempting to connect to it.
We set up a 30-second timer to wait for SQL Server to start. We then download the latest backup from Azure Blob Storage using the download-latest.sh
script.
If the download is successful, we use the built-in sqlcmd
utility to enable SQL Server authentication. We then wait for 10 seconds to ensure that the
SQL Server has stabilized. We then use the sqlpackage
utility to import the bacpac file into the SQL Server. Finally, we set up a CRON job to run the
reimport-database.sh
script at a frequency specified by the CRON_SCHEDULE
environment variable. We then exit the script with a success code.
We need to create the download-latest.sh
and reimport-database.sh
scripts that are called in the initialize-database-and-jobs.sh
script.
download-latest.sh
Create a new file called download-latest.sh
and add the following content:
#!/bin/bash
# Description: This script downloads the latest backup from an Azure Storage Account
# Usage: bash DownloadLatest.sh <storageAccountName> <storageAccountKey> <containerName> <localPath>
accountName=$1
accountKey=$2
containerName=$3
localPath=${4:-"./backup.bacpac"}
# Get the name of the latest blob
firstBlob=$(az storage blob list \
--account-key $accountKey \
--account-name $accountName \
-c $containerName \
--query "[?properties.lastModified!=null && ends_with(name, '.bacpac')]|sort_by(@, &properties.lastModified)[-1].name" \
-o tsv)
# Check if $firstBlob is not null (i.e., there are blobs found)
if [ -n "$firstBlob" ]; then
az storage blob download --account-key $accountKey --account-name $accountName -c $containerName --name $firstBlob --file $localPath --output none
exit 0
else
exit 1
fi
The Azure CLI lets us write queries to filter the results of the az storage blob list
command. The queries are written in
JMESPath, which is a query language for JSON. In this case, we are filtering the results to only include blobs that end with the
.bacpac
extension and then selecting the first one as ordered by the lastModified
property. If there are no blobs found, the script exits with a failure code.
If we find a blob, we download it to the local path specified by the localPath
variable.
enable-authentication.sql
Create a new file called enable-authentication.sql
and add the following content:
sp_configure 'contained database authentication', 1;
GO
RECONFIGURE;
GO
This script enables contained database authentication in the SQL Server instance. This is necessary to allow the sa
user to authenticate to the database
when importing the bacpac file.
reimport-database.sh
Create a new file called reimport-database.sh
and add the following content:
#!/bin/bash
echo "Downloading bacpac file from Azure Blob Storage"
bash /sql/download-latest.sh $ACCOUNT_NAME $ACCOUNT_KEY $CONTAINER_NAME /sql/backup.bacpac
backupJob=$?
if [ "$backupJob" -eq 0 ]
then
echo "Successfully downloaded bacpac file from Azure Blob Storage!"
echo "Kill all connections to the database"
/opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P $MSSQL_SA_PASSWORD -d master -i /sql/kill-all-connections.sql
databaseName=$DATABASE_NAME
existingDatabaseName="${databaseName}_$(date +%s)"
echo "Renaming existing database to $existingDatabaseName"
/opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P $MSSQL_SA_PASSWORD -d master -Q "ALTER DATABASE $databaseName MODIFY NAME = $existingDatabaseName;"
echo "Renamed existing database to $existingDatabaseName"
echo "Importing bacpac file into the SQL Server"
/sql/sqlpackage/sqlpackage /a:import /sf:/sql/backup.bacpac /tsn:localhost,1433 /tdn:$DATABASE_NAME /tu:sa /tp:$MSSQL_SA_PASSWORD /ttsc:True
else
echo "Failed to download bacpac file from Azure Blob Storage"
exit 1
fi
You'll notice that this script is very similar to the initialize-database-and-jobs.sh
script. The main difference is that we are renaming the existing database
before importing the new bacpac file. This is necessary because the sqlpackage
utility does not support overwriting an existing database.
We also need to kill all connections to the database before renaming it. We do this by running a SQL script that we will create called kill-all-connections.sql
.
kill-all-connections.sql
Create a new file called kill-all-connections.sql
and add the following content:
-- kill all connections to the database
DECLARE @killCommand NVARCHAR(MAX) = '';
SELECT @killCommand = @killCommand + 'KILL ' + CAST(spid AS VARCHAR) + ';'
FROM sys.sysprocesses
WHERE dbid > 4;
EXEC sp_executesql @killCommand;
As its name suggests, this script kills all connections to the database. This is necessary to ensure that we can rename the database without any active connections.
Final Directory Structure and Dockerfile Changes
If everything went to plan, your directory should look like this:
.
├── Dockerfile
├── entrypoint.sh
├── initialize-database-and-jobs.sh
├── download-latest.sh
├── enable-authentication.sql
├── reimport-database.sh
├── kill-all-connections.sql
Now before we can build the Docker container, we need to make a few final changes to our Dockerfile
.
We need to copy all the files from our local machine to the container and set the correct permissions on the scripts.
Add the following lines to your Dockerfile
directly after the USER root
line:
RUN mkdir /home/mssql && chown mssql /home/mssql && \
chmod +x /sql/initialize-database-and-jobs.sh && \
chmod +x /sql/entrypoint.sh && \
chmod +x /sql/download-latest.sh
The mkdir
command creates a new directory in the container and the chown
command changes the owner of the directory to the mssql
user.
We then set the correct permissions on the scripts using the chmod
command. Save the Dockerfile
and close it.
Building the Docker Container
Now that we have all the necessary files, we can build the Docker container. Open a terminal and navigate to the directory where your Dockerfile
is located.
Run the following command to build the container:
docker build -t azure-local-database-refresh .
This command will build the container using the Dockerfile
in the current directory and tag the container with the name azure-local-database-refresh
.
The build process may take a few minutes to complete. Once it's done, you can run the container using the following command:
docker run -e ACCOUNT_NAME=<storageAccountName> -e ACCOUNT_KEY=<storageAccountKey> -e CONTAINER_NAME=<containerName> -e DATABASE_NAME=MyDatabase -p 1433:1433 -it azure-local-database-refresh
Replace <storageAccountName>
, <storageAccountKey>
, and <containerName>
with the appropriate values for your Azure Blob Storage account. The -p
flag maps port 1433 of the container to port 1433 of your local machine.
This allows you to connect to the SQL Server instance running in the container from your local machine. The -it
flag runs the container in interactive mode, which allows you to see the output of the container in your terminal.
You should see the output of the container in your terminal. If everything is working correctly, you should see messages indicating that the bacpac file has been downloaded and imported into the SQL Server instance. The import process can take a several minutes to complete, depending on the size of the database. Once the import process is complete, you should see a message indicating that the CRON job has been set up successfully.
Testing the Container
Now that we have our container up and running, we can test it by connecting to the SQL Server instance and verifying that the database has been restored. Open a new terminal and run the following command to connect to the SQL Server instance running in the container:
docker exec -it <containerId> /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P yourStrong(!)Password
Replace <containerId>
with the ID of your container. This command will open an interactive SQL Server prompt. Run the following command to verify that the database has been restored:
SELECT name FROM sys.databases;
If everything went to plan you should see the name of your database in the output. You can also run queries against the database to verify that the data has been restored correctly.
Conclusion
In this article, we created a Docker container that automates the process of downloading the latest backup of a production database from Azure Blob Storage and restoring it to a local SQL Server instance. We used a combination of bash scripts and Docker to create a portable and easy-to-use solution that can be run on any machine that has Docker installed. This solution can be used to provide teams with access to the latest production data for development, testing, reporting, data analysis, or other internal uses. The container can be run on a schedule using a CRON job to ensure that the data is always up-to-date. This solution can be easily extended to support other databases and cloud storage providers.
The code for this solution is available on GitHub. This docker container is also available on Docker Hub.