It looks very similar to this problem but on Server 2012 R2.
Time-out errors occur in Volume Shadow Copy service writers, and shadow copies are lost during backup and during times when there are high levels of input/output
https://support.microsoft.com/en-us/kb/826936
Looking at the event logs of the cluster node the VMs are running on I see the following events (all from VSS) repeated from around the time of the VMs with the problem was being backed up:
12298 - Volume Shadow Copy Service error: The I/O writes cannot be held during the shadow copy creation period on volume \\?\Volume{GUID}\. The volume index in the shadow copy set is 0. Error details: Open[0x00000000, The operation completed successfully.
], Flush[0x00000000, The operation completed successfully.
], Release[0x00000000, The operation completed successfully.
], OnRun[0x80042314, The shadow copy provider timed out while holding writes to the volume being shadow copied. This is probably due to excessive activity on the volume by an application or a system service. Try again later when activity on the volume is
reduced.
].
12297 - Volume Shadow Copy Service error: The I/O writes cannot be flushed during the shadow copy creation period on volume \\?\Volume{GUID}\. The volume index in the shadow copy set is 1. Error details: Open[0x00000000, The operation completed successfully.
], Flush[0x80042313, The shadow copy provider timed out while flushing data to the volume being shadow copied. This is probably due to excessive activity on the volume. Try again later when the volume is not being used so heavily.
], Release[0x00000000, The operation completed successfully.
], OnRun[0x00000000, The operation completed successfully.
].
12341 - Volume Shadow Copy Warning: VSS spent 0x000000000000003c seconds trying to flush and hold the volume \\?\Volume{GUID}\. This might cause problems when other volumes in the shadow-copy set timeout waiting for the release-writes phase, and it can cause the shadow-copy creation to fail. Trying again when disk activity is lower may solve this problem.
12340 - Volume Shadow Copy Error: VSS waited more than 40 seconds for all volumes to be flushed. This caused volume \\?\Volume{GUID}\ to timeout while waiting for the release-writes phase of shadow copy creation. Trying again when disk activity is lower may solve this problem.
8229 - A VSS writer has rejected an event with error 0x800423f3, The writer experienced a transient error.
Interestingly I have another VM that's larger than the failing ones but which doesn't see this issue.
Is there a way to increase the Hyper-V VSS timeout? I've found references to
HKLM\Software\\Microsoft\Windows NT\CurrentVersion\SPP\CreateTimeout but the string isn't present on my system and I've only seen it mentioned in relation to Server 2008 R2, not 2012 R2. http://kb.backupassist.com/articles.php?aid=2997
Edit: Changing the registry key to the 20 minute value mentioned in the article didn't make any difference. While doing some digging to try and find where the 40 second timeout was coming from I found that it's the default value for the NewPathRecoveryTime
in MPIO settings. Is this significant?