Quantcast
Channel: VMware Communities : All Content - All Communities
Viewing all articles
Browse latest Browse all 179681

NVIDIA Grid vGPU M10 performance issues with PCOIP protocol (high GPU utilization)

$
0
0

We have been working on implementing M10 vGPUs in our VMware environment and have been experiencing performance issues. We worked with NVIDIA to verify that the environment is setup correctly. Here is quick bullet point list for the environment.

  • vSphere 6.7
  • Host have VMware ESXi, 6.7.0
    • PowerEdge R740
    • Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz
    • 768 GB RAM
    • 2x M10 GPUs
  • Horizon 7.4.0
  • Linked Clones
    • Windows 10 1803 (also tried 1809 & 1709 builds)
    • 4 core
    • 8 GB RAM
    • M10-1B vGPU profile
    • Teradici based zero clients (PCOIP)

What we have noticed is after our first small set of user testing and no issues we began a larger test and noticed that once we hit about 15 users per M10 we began getting performance issue reports. The users are seeing lag in the interface, a right click on the desktop might take 5-10 seconds for the context menu to appear. The same could be seen with the start menu. Additionally these issues were only occurring on vGPU VMs. On the same host non vGPU VMs were not experiencing the same slowdowns. We began to notice that the pcoip_server_win32.exe was using a lot of CPU and GPU time via the task manager. We began trying different version of the vmware agent and direct connect, various revisions of the driver for esxi. We tried standalone fresh copies of windows 10 and various build numbers. Thus far no combination we have attempted has resolved the issues for vGPU machines when there are more than a few users per machine. The performance problem appears even if we use the Horizon software client and have it set to PCoIP.

We tried running it with different Horizon Agent versions (6, 7.0.2, 7.2.0, 7.4.0, 7.5.1, 7.6 & 7.7) and using direct connect bypassing Horizon Server.

We also tried running VMs using VMware Blast protocol and it didn’t have the high GPU usage issues, but unfortunately, almost all of our thin clients only support PCOIP.

Attaching screenshot below: please note the GPU utilization of PCoIP Server (32bit) process.

teams.jpg

UPDATE 1:

After a long discussion with NVIDIA, they concluded that it's not on their side.

They pointed us at this KB: https://nvidia.custhelp.com/app/answers/detail/a_id/4156/~/nvidia-smi-shows-high-gpu-utilization-for-vgpu-vms-with-active-horizon-session

Looks like it's a known issue with Teradici PCOIP protocol and it hasn't been fixed yet.

UPDATE 2:

I tried downloading and installing Teradici's PCOIP Agent (PCoIP_agent_release_installer_graphics.exe) direct from Teradici. Then I ran "NvFBCEnable.exe -disable", it disables NVFBC capture and switches back to CPU. And it works great - no GPU spike when idle and a much better performance overall.

However, when I try to do this on Horizon Agent's Teradici protocol it disables NvFBC for a brief period of time and then as soon as I reconnect via PCOIP it re-enables it back, see extract from a log below:

  1. Svgadevtap:NvFBCFixed capture by enabling NvFBC

Is there a way to permanently disable NvFBC on Horizon Agent's Teradici Protocol?


Viewing all articles
Browse latest Browse all 179681

Trending Articles