DRBD Preload

2025-03-01

by alex

in 2025

Updated: 2024-11-22

This is a continuation to VM disk replication.

One of the steps needed for VM disk replication is the need to pre-load the replicated volume.

For the project I wrote, we had two mechanisms:

Copy from an attached virtual disk image
Download from an image server

Copying from an attached virtual disk is straightforward use of the dd command.

Downloading from an image server was slightly triciker as it needed to:

Resume downloads (because the large image sizes)
Images should be stored as compressed qcow2 files.
Use gzip on the fly so as to compress large empty disk volume areas.

This was accomplished through a script that would convert qcow2 files into raw:


#!/usr/bin/env python3
#
# Stream qemu image as a raw image
#
from argparse import ArgumentParser
import nbd  # Requires python3-libnbd
import sys
import subprocess

def mycli():
  '''Create an ArgumentParser

  :returns ArgumentParser:
  '''
  cli = ArgumentParser(
    description = 'Stream a qemu image in raw format'
  )
  cli.add_argument('-c','--compress',dest='gzip',help='Enable gzip compression', action='store_true')
  cli.add_argument('--blocksize',help='Read block size',type=int, default=32*1024*1024)
  cli.add_argument('file',help='Disk image')
  cli.add_argument('offset',help='Byte offset to skip',nargs='?',type=int,default=0)
  cli.add_argument('count',help='Byte count to send',nargs='?',type=int)
  return cli

if __name__ == '__main__':
  cli = mycli()
  args = cli.parse_args()

  n = nbd.NBD()
  cmd = ['qemu-nbd', '--read-only', '--persistent', args.file]
  n.connect_systemd_socket_activation(cmd)

  if args.gzip:
    gzip = subprocess.Popen(['gzip'], stdin=subprocess.PIPE)
    fp = gzip.stdin
  else:
    fp = sys.stdout.buffer

  offset = args.offset
  size = n.get_size()
  if not args.count is None and offset + args.count < size:
    size = offset + args.count

  while offset < size:
    c = min(args.blocksize, size-offset)
    b = n.pread(c, offset)
    fp.write(b)
    offset += c

  n.shutdown()

And some bash script to retrieve the data:

comma() {
  echo "$1" | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'                                                              
}

preload_in_guest() {
   [ -z "$preload_url" ] && return
   ( echo "$preload_url" | grep '^https*://') || return 0

   # We do this stupid process because we would time out for large
   # downloads
   cksz=$(expr 32 '*' 1024 '*' 1024) # 32M chunks

   imgsz=$(wget -nv -O- "${preload_url}?chonky=size")
   if ! ( echo "$imgsz" | grep -q '^[0-9][0-9]*$' ) ; then
     echo "Unable to determine image size"
     exit
   fi

   # OK, break it into chunks
   count=$(expr $imgsz / $cksz)
   if [ $(expr $imgsz - $(expr $count \* $cksz)) -gt 0 ] ; then
     count=$(expr $count + 1)
   fi
   echo "WILL DOWNLOAD IMAGE $(comma $imgsz)b IN $(comma $count) CHUNKS"

   offset=0
   seek=0
   while [ $seek -lt $count ]
   do
     echo "Reading chunk: $(comma $seek) of $(comma $count) ( $(expr ${seek}00 / $count)% )"
     if ! (wget -nv -O- "${preload_url}?chonky=$offset,$cksz" \
        | gunzip \
        | dd of=/dev/drbd0 bs=$cksz seek=$seek) ; then
       echo "Error retrieving chunk: $seek"
       exit 1
     fi
     seek=$(expr $seek + 1)
     offset=$(expr $offset + $cksz)
   done
}

On the server size, a PHP script is used that gets chonky from the query parameters and uses the value formatted as offset,size to call the python script.

preload