Xen

There is a line of multiple virtualisation products based on the Xen project hypervisor:

Citrix XenServer
Vates XCP-ng
Oracle VM Server for x86

xentop – top for Xen to see all domains' CPU and I/O stats
xl – Xen management tool, based on libxenlight
- xl info – see general overview over the hypervisor

RAM allocated to the management domain

Dom0 RAM allocation should be calculated by the formula dom0_mem = 502+int(physical_mem*0.0205) according to Oracle¹, meaning for 64GB physical memory you use 1845MB dom0 RAM, for 128GB you use 3188MB and for 512GB physical RAM the dom0 RAM should be 11249MB.

Debugging resources:

Citrix Hypervisor 8.2 API Reference - Error Handling

Maybe the old disk is still in the device list and not displayed in Xen Orchestra or any other GUI you're running.

Check with xe pbd-list and xe sr-list. Output could look something like this:

uuid ( RO)                : a862334e-f51c-feb9-6086-7d42c30293bb 
         name-label ( RW): Local HDD (RAID 10) 
   name-description ( RW): Local HDD Storage 
               host ( RO): <not in database> 
               type ( RO): ext
       content-type ( RO): user

In that case (host <not in database>), just forget the SR with xe sr-forget uuid=a862334e-f51c-feb9-6086-7d42c30293bb and try to add the SR again.

Find out if coalesce is currently running:

ps axf | grep [c]oalesce
 
 6491 ?        R      0:05      \_ /usr/bin/vhd-util coalesce --debug -n /var/run/sr-mount/ed842b8c-49a7-a186-96cd-c18430404bf6/10014af7-27cf-4bb9-b3cd-a081ca470694.vhd

If nothing, rescan SR and repeat the ps.

Force coalesce

If lock is active, remove the lock:

ls /var/lock/sm/<SR_UUID>/gc_active
rm /var/lock/sm/<SR_UUID>/gc_active

Rescan and wait up to 10 min, while tailing SMlog

tail --follow -n 500 /var/log/SMlog

If coalesce errors

Scan all VDI on SR:

vhd-util scan -f -c -p  -m /var/run/sr-mount/$SR_UUID/*.vhd

If no error, check error from SMlog on coalesce, like these (coalesce EXCEPTIONS).

Jan 28 15:56:11 xen01 SMGC: [23350]          ***********************
Jan 28 15:56:11 xen01 SMGC: [23350]          *  E X C E P T I O N  *
Jan 28 15:56:11 xen01 SMGC: [23350]          ***********************
Jan 28 15:56:11 xen01 SMGC: [23350] coalesce: EXCEPTION <class 'util.SMException'>, VHD *811e7004(80.000G/2.155G) corrupted
Jan 28 15:56:11 xen01 SMGC: [23350]   File "/opt/xensource/sm/cleanup.py", line 1542, in coalesce
Jan 28 15:56:11 xen01 SMGC: [23350]     self._coalesce(vdi)

Find the UUID incriminated (here 811e7004…), and test it:

vhd-util check --debug -n /var/run/sr-mount/<SR_UUID>/<VDI_UUID>.vhd

Eg:

primary footer invalid: invalid cookie

Repair it:

vhd-util repair -n /var/run/sr-mount/<SR_UUID>/<VDI_UUID>.vhd

Recheck it, should be OK now.

Remove the lock, and wait a bit, coalesce should start.

If that doesn't help, investigate further and possibly mv (not rm!) the corresponding VDI to see if a rescan can coalesce.

dmidecode -s system-serial-number 
# or:
xenstore read vm

XenServer Administration Handbook: Practical Recipes for Successful Deployments (Mackey and Benedict, 2016)

^[1] Installing Oracle VM Server on x86 – 2.3 Changing the Dom0 Memory Size (Oracle, 2014)

Xen

metrics and debugging tools

Sizing

RAM allocated to the management domain

Troubleshooting

can't add Storage Repository on disk where an SR was before due to SR_DEVICE_IN_USE()

VHD coalesce stuck / broken

Force coalesce

Commands

Find out UUID from inside the VM

Further reading