Post by vanphong1310 » Sun Sep 08, 2019 11:48 pm

Hello there!
NVIDIA Driver 435 is out.
This feature is very interesting, so do you have any guidelines for use it? And when will Linux Mint support this feature out of the box?

Re: NVIDIA Driver with PRIME Render Offload

Post by roblm » Mon Sep 09, 2019 11:35 am

This feature is similar to PRIME GPU offloading described in this article, which uses the open source nouveau and radeon drivers:

Many users of Optimus systems have been eagerly looking forward to it’s implementation, as a replacement for nvidia-prime and Bumblebee. It is discussed in this Nvidia article:
http://download.nvidia.com/XFree86/Linu ... fload.html

Here is a topic in the Nvidia Linux Graphics forum:
https://devtalk.nvidia.com/default/topi ... -optimus/1

The requirements are an X.Org X Server and Nvidia driver that support PRIME Render Offload and a correctly configured xorg.conf file.

According to this Phoronix article, the support will come with X.Org X Server 1.21. Presently, Mint uses version 1.19, so 1.21 won’t be available until Mint 20 at the earliest, but I think more likely closer to 20.3. The newly released Nvidia 435.21 driver provides support.
https://www.phoronix.com/scan.php?page= ... prove-1.21

However, X.Org X Server 1.20 will work with the required PRIME Render Offload support patches added through the aplattner PPA. This PPA can be trusted because it was started by Aaron Plattner, the moderator for the Nvidia Linux Graphics forum. I initially tested this feature several weeks ago using Mint 19.1 and it worked, using the Nvidia 435.21 BETA driver downloaded from Nvidia’s website, because it wasn’t available yet through the graphics-drivers PPA. However, I checked again today and it is now available through that PPA. I also tested that driver.

Be sure to create a Timeshift restore point before starting. I personally continue to use Clonezilla.

Preferrably a new installation should be used.
To get X.Org X Server 1.20, install the 18.04 HWE stack first:
https://wiki.ubuntu.com/Kernel/LTSEnabl ... nic_Beaver

apt install --install-recommends linux-generic-hwe-18.04 xserver-xorg-hwe-18.04

This will also install the 5.0.0-25 kernel. Reboot.

EDITED: Use this command to install 18.04 HWE without the kernel upgrade: apt install xserver-xorg-hwe-18.04

Add the aplattner PPA: https://launchpad.net/~aplattner/+archive/ubuntu/ppa/
sudo add-apt-repository ppa:aplattner/ppa
apt update

Open the Update Manager and install xorg-server-hwe-18.04 (version 2:1.20.4-1ubuntu3-18.04-1ppa1)

Add the graphics-drivers PPA:
sudo apt-add-repository ppa:graphics-drivers/ppa
apt update

Open the Driver Manager and install nvidia-driver-435

Create an xorg.conf file: sudo touch /etc/X11/xorg.conf

Open the file for editing: xed admin:///etc/X11/xorg.conf

Add these lines. The BusIDs come from the inxi -Gx output:

Code: Select all

Section "ServerLayout"
   Identifier "layout"
   Screen 0 "iGPU"
   Option "AllowNvidiaGpuScreens"

Section "Device"
   Identifier "iGPU"
   Driver "modesetting"
   BusID  "PCI:0:2:0"

Section "Device"
   Identifier "nvidia"
   Driver "nvidia"
   BusID  "PCI:1:0:0"

Section "Screen"
   Identifier "iGPU"
   Device "iGPU"
Log out and back in. Use the command xrandr --listproviders to test the setup. There should be 2 providers listed, one is named NVIDIA-GO.

Code: Select all

Providers: number : 2
Provider 0: id: 0x1bd cap: 0x9, Source Output, Sink Offload crtcs: 3 outputs: 2 associated providers: 0 name:modesetting
Provider 1: id: 0x198 cap: 0x0 crtcs: 0 outputs: 0 associated providers: 0 name:NVIDIA-G0
To have an application be offloaded to the Nvidia GPU, use this command:

Code: Select all

For example:

Code: Select all

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo | grep "OpenGL renderer"
OpenGL renderer string: GeForce GTX 1050/PCIe/SSE2
Without PRIME Render Offload, the Intel GPU does all the rendering:

Code: Select all

glxinfo | grep "OpenGL renderer"
OpenGL renderer string: Mesa DRI Intel(R) UHD Graphics 620 (Kabylake GT2)
For an app that supports Vulcan, use this command:

Code: Select all

The command used for PRIME GPU offloading by the open source drivers is a lot simpler: DRI_PRIME=1 <app name>

PRIME Render Offload is a great step forward but needs improvement. The Nvidia card will always be powered on, unless your card has the newer Turing architecture, which has a power management feature. The Turing cards include the RTX 20 series: GeForce RTX 2080 Ti, GeForce RTX 2080 SUPER, GeForce RTX 2080, GeForce RTX 2070 SUPER, GeForce RTX 2070, GeForce RTX 2060 SUPER, GeForce RTX 2060

Monitors connected to ports that are internally connected to the Nvidia GPU may not work.
Most of the pages in the NVIDIA X Server Settings utility will be missing.

The performance has been disappointing, although my test has been limited to using glmark2, an older OpenGL 2.0 benchmark.

Code: Select all

glmark2  (using the Intel GPU)
    glmark2 2014.03+git20150611.fa71af2d
    OpenGL Information
    GL_VENDOR:     Intel Open Source Technology Center
    GL_RENDERER:   Mesa DRI Intel(R) UHD Graphics 620 (Kabylake GT2) 
    GL_VERSION:    3.0 Mesa 19.0.8
glmark2 Score: 1441

Code: Select all

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glmark2 (using the Nvidia GPU)
    glmark2 2014.03+git20150611.fa71af2d
    OpenGL Information
    GL_VENDOR:     NVIDIA Corporation
    GL_RENDERER:   GeForce GTX 1050/PCIe/SSE2
    GL_VERSION:    4.6.0 NVIDIA 435.17
glmark2 Score: 1198
Compared to other scores from previous testing:

Same Optimus laptop using nvidia-prime:
Intel HD Graphics 620 - glmark2 Score: 1379
Nvidia GTX 1050 using Nvidia-384 driver - glmark2 Score: 7855

Desktop PC with Nvidia GT 730 card:
Using Nvidia driver - glmark2 Score: 1790
Using nouveau driver - glmark2 Score: 417

Desktop PC with AMD Radeon R7 240 card:
Using radeon driver - glmark2 Score: 1954
Using amdgpu driver - glmark2 Score: 2282
Using amdgpu-pro driver - glmark2 Score: 1276
Re: NVIDIA Driver with PRIME Render Offload

Post by Pickle » Mon Sep 09, 2019 12:15 pm

That's quite disappointing performance. A buffer copied from one card to the other shouldnt be that bad.

Re: NVIDIA Driver with PRIME Render Offload

Post by roblm » Mon Sep 09, 2019 12:27 pm

Pickle wrote: That's quite disappointing performance. A buffer copied from one card to the other shouldnt be that bad.
The performance does seem unbelievably poor, but the performance using PRIME GPU offloading with the open source drivers has always been even worse. I just hope I didn’t make any mistakes in my procedure.
Re: NVIDIA Driver with PRIME Render Offload

Post by Pickle » Mon Sep 09, 2019 12:41 pm

well i would expect nouveau to always be lower since they cant always change clock the gpu to the highest.

i wonder if its something with the synchronization between the two.
Do you think turning off the sync to vblank off would give different results?

Re: NVIDIA Driver with PRIME Render Offload

Post by roblm » Mon Sep 09, 2019 2:38 pm

Pickle wrote:well i would expect nouveau to always be lower since they cant always change clock the gpu to the highest.
The poor performance of the nouveau driver was expected but the poor performance of the radeon driver was really disappointing, considering the significant improvements AMD has made to their open source driver. The problem must be with the open source PRIME technology.

Pickle wrote:well i wonder if its something with the synchronization between the two.
Do you think turning off the sync to vblank off would give different results?
I seem to remember that the glmark2 scores were not affected by turning off or on VBlank, so I tested that again. I first tested a desktop PC because VBlank could be turned off in the Nvidia Settings utility on the OpenGL page, or by using the command nvidia-settings -a SyncToVBlank=0. Another way to turn off VBlank is to use the command
export __GL_SYNC_TO_VBLANK=0, which affects system settings.

Code: Select all

export __GL_SYNC_TO_VBLANK=0
14625 frames in 5.0 seconds = 2924.913 FPS
14651 frames in 5.0 seconds = 2929.899 FPS

Code: Select all

export __GL_SYNC_TO_VBLANK=1
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
303 frames in 5.0 seconds = 60.518 FPS
301 frames in 5.0 seconds = 60.003 FPS
The results with glmark2 were similar with VBlank off or on.

On the Optimus laptop, the Nvidia Settings utility only shows these pages now:

nvidia-settings PRIME Render Offload.jpg

Using the command nvidia-settings -a SyncToVBlank=0 got this output:
ERROR: Error resolving target specification '' (No targets match target specification), specified in assignment 'SyncToVBlank=0'.
So I used the __GL_SYNC_TO_VBLANK=0 command and then ran glmark2. The results were similar.
Re: NVIDIA Driver with PRIME Render Offload

Post by vanphong1310 » Mon Sep 09, 2019 10:12 pm

Hello @roblm, thank you for your great information and tutorial. I will try this soon.

Re: NVIDIA Driver with PRIME Render Offload

Post by roblm » Fri Sep 13, 2019 3:35 pm

I tested the Unigine Heaven Benchmark 4.0, a GPU-intensive benchmark, which has support for OpenGL 4.0 (newest version is 4.6). The performance results with PRIME Render Offload were on par with using nvdia-prime. Testing was done in Mint 19.1 Cinnamon with the default 4.15.0-20 kernel. Download Heaven from here:

The program was started and the Preset option was set to Basic. Start the demo and click the Benchmark button. There will be 26 scenes.

Here is the result of using the Nvidia GPU with nvidia-prime:

Unigine Heaven-nvidia-prime.png

Here is the result of using the Nvidia GPU with PRIME Render Offload:
~/Unigine_Heaven-4.0$ __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia ./heaven

Unigine Heaven-PRIME Render Offload.png

Here is the result of using the Intel graphics:
~/Unigine_Heaven-4.0$ ./heaven

Unigine Heaven-PRIME-R-Offload-Intel .png

Re: NVIDIA Driver with PRIME Render Offload

Post by roblm » Sun Sep 15, 2019 4:04 pm

This testing was done using Phoronix Test Suite, the most comprehensive testing and benchmarking tool available for Linux, which supports OpenGL 4.6. Version 5.2.1 is available in the repos using the command apt install phoronix-test-suite, but the latest version 8.8.1, which I used, can be downloaded from here:

Click the Downloads tab and then click Ubuntu/Debian Package. This downloads the file:

Click on the file to start the installation. If the GDebi Package Installer doesn't start, then right click on the file and select: Open with GDebi Package Installer

Open the program from the menu. This version does not use the default command prompt in the Terminal window, which is shown in the first prompt in the picture below.
The command prompt already shows phoronix-test-suite and the color is not very visable, if you want a white text on a black background, as shown in the second prompt in the picture. This same color is used for category names.
I tried to improve the color by clicking Edit > Preferences > Colors but the only setting that would produce a noticeably more visable color was by selecting XTerm under Palette > Built-in schemes, which is shown in the third prompt.

Phoronix Test Suite-command prompt.png

After the program loads, type list-available-tests to show the available tests. If the program from the repos was installed, it uses the default command prompt, so the full command needs to be typed: phoronix-test-suite list-available-tests

Code: Select all

$ phoronix-test-suite list-available-tests

To view a shortened list of recommended tests, based on the most popular tests downloaded, type:

Code: Select all

Recommended OpenBenchmarking.org Test Profiles

Processor Tests

pts/encode-mp3                 - LAME MP3 Encoding                      
pts/x264                       - x264                                   
pts/john-the-ripper            - John The Ripper                        
pts/crafty                     - Crafty                                 
pts/npb                        - NAS Parallel Benchmarks                
pts/hpcc                       - HPC Challenge                          
pts/smallpt                    - Smallpt                                
pts/botan                      - Botan                                  
pts/tachyon                    - Tachyon                                
pts/primesieve                 - Primesieve                             

System Tests

pts/apache                     - Apache Benchmark                       
pts/pgbench                    - PostgreSQL pgbench                     
pts/pybench                    - PyBench                                
pts/nginx                      - NGINX Benchmark                        
pts/caffe                      - Caffe                                  
pts/blender                    - Blender                                
pts/stress-ng                  - Stress-NG                              
pts/mcperf                     - Memcached mcperf                       
pts/tjbench                    - libjpeg-turbo tjbench                  
system/gimp                    - GIMP                                   

Graphics Tests

pts/csgo                       - Counter-Strike: Global Offensive       
pts/cuda-mini-nbody            - CUDA Mini-Nbody                        
pts/deus-exmd                  - Deus Ex: Mankind Divided               
pts/octanebench                - OctaneBench                            
pts/talos-principle            - The Talos Principle                    
pts/mixbench                   - Mixbench                               
pts/unigine-super              - Unigine Superposition                  
pts/hitman                     - HITMAN                                 
pts/riseofthetombraider        - Rise of the Tomb Raider                
pts/batman-origins             - Batman: Arkham Origins                 

Disk Tests

pts/fio                        - Flexible IO Tester                     
pts/iozone                     - IOzone                                 
pts/fs-mark                    - FS-Mark                                
pts/blogbench                  - BlogBench                              
pts/startup-time               - Application Start-up Time              
system/fio                     - Flexible IO Tester                     

Network Tests

pts/iperf                      - iPerf                                  
pts/netperf                    - Netperf                                
pts/ethr                       - Ethr                                   
pts/nuttcp                     - Nuttcp                                 

Memory Tests

pts/ramspeed                   - RAMspeed SMP                           
pts/t-test1                    - t-test1                                
pts/stressapptest              - Stressful Application Test 

Then install a test. I installed pts/gputest, which is a GPU test: install pts/gputest

Code: Select all

$ phoronix-test-suite install pts/gputest
    Installed:     pts/gputest-1.3.2

The Nvidia GPU using nvidia-prime was first tested using the command: run pts/gputest

The following will be seen. Select which of these 8 tests will be run, by typing it’s number. Selecting 8 will run all of the other 7 tests but is very lengthy in time, since each test is run at least 3 times.

Code: Select all

GpuTest 0.7.0:
    Graphics Test Configuration
        1: Furmark
        2: TessMark
        3: GiMark
        4: Pixmark Piano
        5: Pixmark Volplosion
        6: Triangle
        7: Plot3D
        8: Test All Options
        ** Multiple items can be selected, delimit by a comma. **
Then type the number for the resolution to be used:

Code: Select all

	1: 800 x 600
        2: 1024 x 768
        3: 1600 x 900
        4: Test All Options
        ** Multiple items can be selected, delimit by a comma. **
Choose if the test is run in Fullscreen or Windowed mode, or both:

Code: Select all

	1: Fullscreen
        2: Windowed
        3: Test All Options
        ** Multiple items can be selected, delimit by a comma. **

Code: Select all

GpuTest 0.7.0:
    pts/gputest-1.3.2 [Test: Furmark - Resolution: 1600 x 900 - Mode: Windowed]
    Test 1 of 1
    Estimated Trial Run Count:    3                     
    Estimated Time To Completion: 4 Minutes [13:32 CDT] 
        Started Run 1 @ 13:29:10
        Started Run 2 @ 13:30:20
        Started Run 3 @ 13:31:29

    Test: Furmark - Resolution: 1600 x 900 - Mode: Windowed:

    Average: 3693 Points

The Intel GPU was tested next:

Code: Select all

GpuTest 0.7.0:
    pts/gputest-1.3.2 [Test: Furmark - Resolution: 1600 x 900 - Mode: Windowed]
    Test 1 of 1
    Estimated Trial Run Count:    3                     
    Estimated Time To Completion: 4 Minutes [13:00 CDT] 
        Started Run 1 @ 12:57:37
        Started Run 2 @ 12:58:46
        Started Run 3 @ 12:59:55

    Test: Furmark - Resolution: 1600 x 900 - Mode: Windowed:

    Average: 700 Points

To test the Nvidia GPU using PRIME Render Offload, don’t open Phoronix Test Suite. Open the Terminal and use this command:

Code: Select all

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia phoronix-test-suite run pts/gputest

Code: Select all

GpuTest 0.7.0:
    pts/gputest-1.3.2 [Test: Furmark - Resolution: 1600 x 900 - Mode: Windowed]
    Test 1 of 1
    Estimated Trial Run Count:    3                     
    Estimated Time To Completion: 4 Minutes [13:19 CDT] 
        Started Run 1 @ 13:16:22
        Started Run 2 @ 13:17:31
        Started Run 3 @ 13:18:40
        Started Run 4 @ 13:19:50 *

    Test: Furmark - Resolution: 1600 x 900 - Mode: Windowed:

    Average: 3548 Points

Final conclusions. The outdated glmark2 is not a reliable benchmark tool for testing newer more powerful graphics hardware that support newer versions of OpenGL.

The performance results using the Nvidia GPU with PRIME Render Offload were on par with using nvdia-prime.

