DRBD Preload

hd

This is a continuation to VM disk replication.

One of the steps needed for VM disk replication is the need to pre-load the replicated volume.

For the project I wrote, we had two mechanisms:

  • Copy from an attached virtual disk image
  • Download from an image server

Copying from an attached virtual disk is straightforward use of the dd command.

Downloading from an image server was slightly triciker as it needed to:

  • Resume downloads (because the large image sizes)
  • Images should be stored as compressed qcow2 files.
  • Use gzip on the fly so as to compress large empty disk volume areas.

This was accomplished through a script that would convert qcow2 files into raw:


#!/usr/bin/env python3
#
# Stream qemu image as a raw image
#
from argparse import ArgumentParser
import nbd  # Requires python3-libnbd
import sys
import subprocess

def mycli():
  '''Create an ArgumentParser

  :returns ArgumentParser:
  '''
  cli = ArgumentParser(
    description = 'Stream a qemu image in raw format'
  )
  cli.add_argument('-c','--compress',dest='gzip',help='Enable gzip compression', action='store_true')
  cli.add_argument('--blocksize',help='Read block size',type=int, default=32*1024*1024)
  cli.add_argument('file',help='Disk image')
  cli.add_argument('offset',help='Byte offset to skip',nargs='?',type=int,default=0)
  cli.add_argument('count',help='Byte count to send',nargs='?',type=int)
  return cli

if __name__ == '__main__':
  cli = mycli()
  args = cli.parse_args()

  n = nbd.NBD()
  cmd = ['qemu-nbd', '--read-only', '--persistent', args.file]
  n.connect_systemd_socket_activation(cmd)

  if args.gzip:
    gzip = subprocess.Popen(['gzip'], stdin=subprocess.PIPE)
    fp = gzip.stdin
  else:
    fp = sys.stdout.buffer

  offset = args.offset
  size = n.get_size()
  if not args.count is None and offset + args.count < size:
    size = offset + args.count

  while offset < size:
    c = min(args.blocksize, size-offset)
    b = n.pread(c, offset)
    fp.write(b)
    offset += c

  n.shutdown()

And some bash script to retrieve the data:

comma() {
  echo "$1" | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'                                                              
}

preload_in_guest() {
   [ -z "$preload_url" ] && return
   ( echo "$preload_url" | grep '^https*://') || return 0

   # We do this stupid process because we would time out for large
   # downloads
   cksz=$(expr 32 '*' 1024 '*' 1024) # 32M chunks

   imgsz=$(wget -nv -O- "${preload_url}?chonky=size")
   if ! ( echo "$imgsz" | grep -q '^[0-9][0-9]*$' ) ; then
     echo "Unable to determine image size"
     exit
   fi

   # OK, break it into chunks
   count=$(expr $imgsz / $cksz)
   if [ $(expr $imgsz - $(expr $count \* $cksz)) -gt 0 ] ; then
     count=$(expr $count + 1)
   fi
   echo "WILL DOWNLOAD IMAGE $(comma $imgsz)b IN $(comma $count) CHUNKS"

   offset=0
   seek=0
   while [ $seek -lt $count ]
   do
     echo "Reading chunk: $(comma $seek) of $(comma $count) ( $(expr ${seek}00 / $count)% )"
     if ! (wget -nv -O- "${preload_url}?chonky=$offset,$cksz" \
        | gunzip \
        | dd of=/dev/drbd0 bs=$cksz seek=$seek) ; then
       echo "Error retrieving chunk: $seek"
       exit 1
     fi
     seek=$(expr $seek + 1)
     offset=$(expr $offset + $cksz)
   done
}

On the server size, a PHP script is used that gets chonky from the query parameters and uses the value formatted as offset,size to call the python script.

preload