Quantcast
Channel: VMware Communities : All Content - All Communities
Viewing all articles
Browse latest Browse all 179681

Progressively degrading data transfer speed - losing my mind!

$
0
0

We are attempting to migrate a bloated 1.5TB Win2k3 VM from
one SAN to another so that we can do some capitol maintenance and
reconfiguration work on the SAN that is currently holding this VM. It being a
long weekend, I thought it would be the perfect time to undertake this project
but I am having major problems with transfer speeds starting out fast and then slowly
nose-diving into the ridiculous. I have been reading posts and articles all
night and I just can’t figure out WTF.

 

 

 

 

 

The setup:

 

 

Host Server is a Dell PowerEdge 2900 attached via Dual iSCSI
link to a Dell MD 3000i – high-end SAN packed with 15k RPM SAS drives with
RAID.

 

 

VMWare version is ESXi 4.0 Enterprise Budle

 

 

Virtual Machine in question is Win2k3 (not that it matters)

 

 

Destination is either a directly-network-attached 1Gbps Linux
NAS packed with a proper RAID controller and 6G SATA drives.

 

 

Another destination that I tried is the host server’s internal
15k SAS disks on its onboard array.

 

 

The problem is that no matter what method of moving this VM
I try, the data transfer rate starts out AWESOME, at 95MB/Sec or higher, but
over the course of an hour or so slows down to a crawl – 10MB/Sec or even
slower. Looking at the data transfer rates on the disk read/write or on the
NICs, the drop follows a smooth downward arc. The symptom is the same whether
we attempt to perform a backup job from within the VM’s OS, or whether we try
to “Clone” from within the vSphere Console, or run a backup via VMWare Data
Recovery appliance interface.  The sharpness
of the curve’s slowdown in transfer rate is different depending on the method,
but it happens nonetheless.

 

 

When running a backup from within the VM’s OS, the speed
drops slowly, starting out at 80MBps, down to 60MBps within an hour, down to
30MBps an hour after that and eventually levels out at about 8MBps for the rest
of the duration.

 

 

When we try to use the Clone function from within vSphere,
the job times out within 2 hours or so but the curve drop is quicker. It starts
out at 95MBps and within an hour is down to 10MBps or lower. I read a post and
found a place where I think I can extend the timeout period, but it seems
idiotic to me that I cannot just turn off the timeout altogether to let the job
finish.

 

 

Trying the VMWare Data recovery, the transfer speed drops
MUCH faster and much lower; it starts at about 50MBPS, but nose dives to under
1MB/sec within an hour. Leaving it running overnight, I finally canceled it as
it barely moved 50GB.

 

 

I have looked at everything that I can think of to explain
this, but everything looks normal – I cannot find the bottleneck or explain why
the speed drops off like this. I have shut off all other VMs on the system. I
have tried moving the target VM with it off. It seems to make no difference!
All systems – the host, the SAN and the NAS have plenty of resources and none
of them are getting hammered. I’m just at a loss to explain what’s going on.

 

 

I read an article about VAAI causing something very similar to
this, but after some research, it seems that VAAI support didn’t make it into
ESXi until version 4.1 and I am on 4.0. I tried to look for the VAAI controls
within vSphere and did not see them. However this MUST be some basic and
fundamental communications/buffer issue and I suspect (given that I have tried
moving the VM to two completely separate destinations) that the problem is
between the $8,000 host server and the $30,000 SAN. Ironically, my home-built
$1000 host server and $2000 SAN works perfect fine!

 

 

Suggestions welcome!


Viewing all articles
Browse latest Browse all 179681

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>