2023-07-14 newest contents, 2023-07-14 last update, 2023-07-14 first day, Robert Jasiek

AI Computer

Introduction

An AI computer with a Nvidia graphics card needs hardware and software, which includes a Nvidia driver for the graphics card, tools, Nvidia libraries, the go AI engine KataGo and graphical user interface (GUI) software. KataGo comes in the three major versions OpenCL, CUDA and TensorRT, needs a trained model net and requires tuning. In every GUI software, we set a command to call KataGo. There are Nvidia CUDA, CuDNN and TensorRT libraries. Good software installation and configuration are difficult but mandatory for achieving fast speeds.

Tuning enables some 2x speed and is more important than buying an RTX 4090 instead of an RTX 4070. We might achieve 3x speed by using KataGo TensorRT and Nvidia TensorRT libraries instead of KataGo OpenCL and the basic OpenCL library. We might achieve 5x speed by using a good combination of Nvidia library file versions. This is almost as important as 6x speed of the fastest graphics card RTX 4090 of current generation versus the slowest graphics card RTX 3050 of the previous generation. Altogether, the best versus the worst installation and tuning of KataGo and Nvidia libraries can result in more than 14x speed.

If this manual enables your good software installation within a few days, this would be much faster than my almost two months without it. While we have already paid for the great Nvidia libraries via our expensive GPU, we can express our gratitude for the free and marvellous AlphaGo research papers, KataGo, its model net and GUI softwares by contributing to KataGo training, participating in programming or writing manuals.

Hardware

An AI computer should have a dedicated graphics card (dGPU) to execute go AI software (such as KataGo). The major options are notebooks (or other mobile devices) and desktops. Minor options include mini PCs, consoles, eGPUs, online services or AI servers.

Notebooks

Notebooks have the advantage of mobility. Consider this checklist for disadvantages, of which every notebook has some:

Desktops

Desktops have the advantages of choice, price, later low cost upgrades, speed, possible features and possible low noise of 37dB under GPU load. They have some of these disadvantages:
Here are some recommendations and considerations:

Mini PCs

Mini PCs with Laptop GPUs can have the advantages of price and low noise but can have these disadvantages:

Which GPU?

This section is written in summer 2023. For a desktop, 4080 is prohibitively expensive relative to its speed. The same has to be said about 4060TI and 4060, although they are low end to mid tier. If you consider the previous generation, I suggest buying a notebook or mini pc instead because you will get similar speed in a much smaller form factor. This leaves 4070, 4070TI and 4090. The choice between 4070 and 4070TI is a matter of taste - how much expense and wattage do you prefer? 4090 is slightly more efficient as to price per speed or watt but is a totally different beast: more than twice as expensive as 4070 and more than twice the speed at more than twice the wattage and power expense. You buy a monster and need to tame and feed it. 4090 is good for bragging rights while 4070 is reasonable (but already an upper mid tier expense). If you want a 4090, my question is: why not go straight to 4 * 4090 or 8 * 4090 water cooled (not to mention Hopper)...?

For a notebook, 4050, 4060 and 4070 are similar to each other and similar to 3070TI. If you choose a limited expense and do not care for the most silent operation, I'd recommend choosing a 3070TI notebook on sale; circa €1300 incl. VAT is sometimes possible. If, however, you want speed, choose 4080 or 4090 (Laptop variants of the GPUs) and, if necessary, wait for price drops. 4080 Laptop @ 175W is about 3080 10GB Desktop and 4090 Laptop @ 175W is about 4070TI Desktop. You might prefer lower wattage or manually set power targets (Afterburner!) though. A notebook GPU at 80W is rather silent and at 110W (sometimes 130W depending on the model, always including dynamic boost) can still have reasonable noise. Recall that 75% - 85% power taget (or fan modes choosing such) only loses 5 - 20% speed with RTX 4000. A 4090 at much lower power target would still be faster than 4080 at the same wattage but it is an expensive road to silence, unless you are lucky and find your dream notebook at a sale.

If a total expense of €700 is too much for your taste or you are 5 kyu or weaker, forget everything I have said and choose whatever hardware you can get hold of. Even CPU computing, iPads with M-chip or Macbook Air M1 are options then. However, do not just buy anything because it is cheap - while M1 has less than 1/30 the speed of 4070 Desktop, older hardware might have 1/100th or less and have trouble even reading ladders.

If you already have a desktop and consider upgrading, my advice has been: don't but wait. O well, maybe unless you have something more than 6 years old. €650 will do wonders while instead €350 first of all feeds Nvidia. Do not consider AMD or Intel; saving a few bucks is not worth it when Nvidia libraries can multiply speed.

Comparing a Particular Notebook

Let us study Lenovo Legion Slim 7i 16 (Gen 8), 13900H, 4070 Laptop, tested by Notebookcheck.

The benchmarks allow an approximate comparison of 4070 Laptop, 4070 Desktop, 4080 Laptop, 3080 Ti Laptop and other GPUs. I expect KataGo TensorRT to produce larger relative differences. 4070 Desktop and 4080 Laptop appear to be similar, as well as 4070 Laptop and 3080 Ti Laptop. However, one must note that the results of especially 4070 Laptop in this particular 19.9mm thin notebook with insufficient cooling rely on loud and hot operation at 54.4dB, even with only 116W GPU power. One tier lower than desktop means loud notebook operation. In some other, thicker notebooks with much better cooling, speeds might be one or two GPU tiers lower with -20% or -30% at roughly 39 ~ 43dB noise.
Mode                                      Fire Strike GPU
Performance + GPU OC on + Overdrive on    30130
Performance + GPU OC on + Overdrive off   30153
Performance + GPU OC off + Overdrive off  28629 (-5%)
Balanced                                  27708 (-8%)

2560x1440 Time Spy Graphics (Performance + GPU OC on + Overdrive on)
+67%  21279  Lenovo Legion Pro 7 4090 Laptop, 13900HX   
+54%  19565  Alienware x16 R1 4080 Laptop, 13900HK   
+40%  17847  Zotac Gaming GeForce RTX 4070 AMP Airo, 13900K   
__0%  12734  Lenovo Legion Slim 7i 16 Gen 8 4070 Laptop, 13900H   
-11%  11390  Average 4070 Laptop   
-12%  11227  Asus ROG Zephyrus Duo 16 GX650RX 3080 Ti Laptop, 6900HX   
-15%  10855  Lenovo Legion Pro 5 16IRX8 4060 Laptop, 13700HX   
-22%   9980  HP Omen 16-b1090ng 3070 Ti Laptop, 12700H   
-30%   8855  Dell G16 7620 3060 Laptop, 12700H   
-33%   8526  Lenovo Legion 7 15IMH05-81YT001VGE 2080 Super Max-Q, 10980HK   
-44%   7093  Lenovo Legion C7 15IMH05 82EH0030GE 2070 Max-Q, 10875H   

Blender v3.3 Classroom OPTIX/RTX (Performance + GPU OC on + Overdrive on, smaller is better)
+63%  31s  Dell G16 7620 3060 Laptop, 12700H   
+17%  22+s Average 4070 Laptop
+11%  21s  Lenovo Legion Pro 5 16IRX8 4060 Laptop, 13700HX   
__0%  19s  Lenovo Legion Slim 7i 16 Gen 8 4070 Laptop, 13900H   
-21%  15s  Alienware x16 R1 4080 Laptop, 13900HK   
-26%  14s  Zotac Gaming GeForce RTX 4070 AMP Airo 4070 Desktop, 13900K   
-37%  12s  Lenovo Legion Pro 7 4090 Laptop, 13900HX   

Blender v3.3 Classroom CUDA (Performance + GPU OC on + Overdrive on, smaller is better)
+63%  52s  Dell G16 7620 3060 Laptop, 12700H   
+25%  40s  Lenovo Legion Pro 5 16IRX8 4060 Laptop, 13700HX   
+14%  36+s Average 4070 Laptop   
__0%  32s  Lenovo Legion Slim 7i 16 Gen 8 4070 Laptop, 13900H   
-28%  23s  Zotac Gaming GeForce RTX 4070 AMP Airo Desktop, 13900K   
-34%  21s  Alienware x16 R1 4080 Laptop, 13900HK   
-44%  18s  Lenovo Legion Pro 7 4090 Laptop, 13900HX

Misc Values
19.9mm = chassis thickness
54,4dB = noise 3D game / Witcher 3 Ultra / Load Max (Performance + GPU OC on + Overdrive on)
48,9dB = noise 3D game (Balanced)
98C = GPU Memory Junction Temperature max
71C = GPU Witcher 3 Ultra Performance + GPU OC on + Overdrive on
64C = GPU Witcher 3 Ultra Balanced average
116W = GPU Power max

Introduction to Software

We need drivers, tools, libraries, go AI engine and graphical user interface (GUI) software.

In particular, first we install the driver of the dGPU. For a Nvidia endconsumer graphics card, there are Nvidia's Gaming driver, which is updated more frequently and provides a few additional features for 3D games, or alternatively Nvidia's Studio driver, which emphasises long stability and provides a few additional features for work software. I can confirm that Nvidia's Studio driver works for Nvidia libraries and KataGo. I do not know, and nobody has told me yet, whether Nvidia's Gaming driver, which is mostly meant for 3D games but KataGo is a machine learning game, works for Nvidia libraries and KataGo and, if so, whether it is faster or slower than Nvidia's Studio driver used together with both. That my RTX 4070 with Nvidia's Studio driver is some 4x faster than an RTX 2080TI suggests that Nvidia's Studio driver is a reasonable choice. Whichever kind of driver we prefer, it must fit the operating system, such as Windows 11 64-bit, and usually should be the newest stable version.

Under Windows, I recommend the following tools. Instead of wasting time on various crapware, consider CPU-Z for stress testing the CPU, Furmark for stress testing the GPU, KataGo on the dGPU with long time settings and AI player for almost stress testing the GPU, HWiINFO64 to monitor loads, temperatures and fan speeds, the mainboard UEFI to set fan speeds, Windows | memory diagnostics to test RAM, OCCT to test the VRAM, Afterburner for tuning the dGPU.

An engine is a go AI (artificial intelligence) software, such as KataGo, and generates moves. A GUI, such as Lizzie or KaTrain, displays the go board. The GUI calls the engine so that both run simultaneously and interact with each other. We interact with the GUI. Only if you only have a CPU, use KataGo Eigen. If you just want to get some KataGo running on a GPU, start with its OpenCL version and the main GUIs by installing Baduk AI Megapack. KataGo CUDA or especially TensorRT should be faster but their installation is advanced.

Every graphics card supports OpenCL as an application interface between software and graphics card. AMD graphics cards only support OpenCL. Nvidia graphics cards support OpenCL and have CUDA cores, tensor cores and RT (raytracing) cores. KataGo supports OpenCL, CUDA cores and tensor cores. KataGo supports OpenCL easily. It needs little more than its OpenCL.dll library file. It can also use tensor cores, but we need not tell it to do so. If KataGo shall use CUDA cores, it needs both Nvidia's CUDA libraries and Nvidia's CuDNN libraries, whose installation we discuss later. If KataGo shall make best use of tensor cores, it needs Nvidia's TensorRT libraries, whose installation we discuss later. Hence, there can be different kinds of cores and different kinds of libraries, which enable software to use some of the cores at all. Some libraries enable more efficient use of particular cores.

We have our new computer and want to quickly test whether its dedicated graphics card allows us to run a GUI and KataGo. We might not want to start by testing all - OpenCL, CUDA and tensor cores, CUDA, CuDNN and TensorRT libraries - at once. A convenient start is the Baduk Megapack, which comes as an installer of currently version 4.18.0 (on 2023-06-12) for Windows 11 64-bit, installs a couple of GUIs and instances of KataGo.

For now and the sake of simplicity, we keep the recommended installation directory C:\baduk and are logged in with a Windows adminstrator user. Some go software programmers are at home in the Linux world and do not respect the Windows security conventions of using applications installed to the write-protected C:\Program Files or C:\Program Files (x86) directories as a Windows standard user. Later, such go softwares want to write files in their installation directories. We postpone related management of Windows security but might disable internet connections when logged in as a Windows administrator user.

The installation process of Baduk Megapack comes with a surprise: a command line window protocols various things and interacts with us so that it can initially tune KataGo and adjust it at least roughly to our dedicated graphics card. For now, most of these settings can be answered somehow. If we can leave some parameter empty at its default, we just do so. However, there is one absolutely essential question. One or a few graphic card devices are listed and each has a number 0, 1, etc. We write and only write the stated Device number of our dedicated graphics card. For example, the text Found OpenCL Device 1 = RTX 4070 indicates that we must write 1 . (If you have several dedicated graphics cards, list them. I would, however, not include the integrated graphics card so that your CPU remains cooler and the software has fewer reasons to exhibit any bugs. Overclockers may have a different opinion.) After answering the query, we are patient and watch the initial tuning progress.

After installation of Baduk Megapack, we can try KaTrain. The Hamburger menu (click on three horizontal bars) gives access to General & Engine Settings. Click Download KataGo version and select the OpenCL instance of KataGo, whose path is C:\baduk\lizzie\katago.exe . Click Download Models and choose one of the *.gz or *.bin.gz files. Afterwards, there should be entries in the three rows Path to KataGo executable, Path to KataGo config file and Path to KataGo model file. Override Engine Command remains empty for now. Adjust the Maximum time for analysis. (A small value lets us see an operating AI quickly while a large value lets us check GPU usage and running processes in suitable tools.) Click on Update Settings and possibly wait for a few minutes. Close this tab by ESC. If KaTrain freezes at this moment, kill its process and restart KaTrain. Then we may find that General & Engine Settings have the right entries. In the Player Setup, set, for example, Black Human and White AI; press ESC. Click on the board and the engine should reply with its moves. The Windows task manager (CTRL + ALT + DEL) notices some temperature increase of a dedicated graphics card or its load for Furmark but has trouble noticing the GPU load of advanced software. We can use a tool, such as HWiNFO64, to monitor GPU load. We know that the go engine works and uses our dedicated graphics card if HWiNFO64 indicates its 50% ~ 100%, typically 94% ~ 96% load while pondering.

Playing against KataGo may result in the following early experience. An amateur high dan may observe to play quite a few moves he plays for each of these two types: a) KataGo's best move; b) hardly considered by KataGo at all. One must not enter narrow tracks just because KataGo may have its firm preferences. Playing against AI is less suitable for learning the endgame. KataGo with more playouts may more often agree to human play, especially if moves have limited creativity. In positions with many reasonable candidates, KataGo might not seriously consider all of them. Not all of KataGo's moves are good or even the best.

KataGo Installation

Introduction

For a proper installation of KataGo, download its Windows versions for OpenCL and, if needed for a Nvidia graphics card, CUDA and TensorRT. Typically, the download files come as compressed ZIP archives. Here, we face a minor difficulty: do we need the files with or without bs29? bs29 is for board sizes up to 29x29. For us starters, we avoid such extras and chose the files without bs29. The download files have names like these:
katago-v1.13.0-opencl-windows-x64.zip
katago-v1.13.0-cuda11.2-windows-x64.zip
katago-v1.13.1-trt8.5-cuda11.2-windows-x64.zip
These are meaningful file names but we must be able to decipher them. 1.13.0 or 1.13.1 is KataGo's version number. x64 denotes 64-bit Windows. opencl is the KataGo version for OpenCL. cuda11.2 is the KataGo version for the CUDA and CuDNN libraries in their versions 11.2. trt8.5-cuda11.2 is the KataGo version for the TensorRT library in its version 8.5, which relies on installed CUDA and CuDNN libraries in their versions 11.2. Although a KataGo download file contains some DLLs, it does not contain the CUDA, CuDNN and TensorRT library files, which we must seek separately from other sources.

It can sometimes happen that a download page of KataGo does not contain all three versions. In this case, we must visit several subpages of KataGo's webpage to get them all.

Furthermore, on KataGo's webpage, we find and download from a model file, which is a pretrained neural net. Usually, newer model files are better than older model files. However, there is the additional aspect that models come in different block sizes. In the early days, larger block sizes indicated stronger models. Currently, this is not the case but the block size 18 is the strongest for typical usage. We recognise it by b18 early in its name, such as kata1-b18c384nbt-s6386600960-d3368371862.bin.gz . Model files are compressed as *.bin.gz or *.gz. With our use, we do not decompress them - instead, we simply use them. The tail of a long file name might be just random digits. However, for our convenience, we may rename the file to, say, b18.bin.gz .

Install the contents of each ZIP file to its own directory. That is, use, for example, the Windows Explorer to unpack a particular ZIP and then copy the contained files and any folders to its installation folder. For example, create the directories
C:\katago_OpenCL
C:\katago_CUDA
C:\katago_TensorRT
and install the appropriate files to their directory. Furthermore, copy the model file b18.bin.gz to each of the three directories. This wastes disk space but later eases calling the model file. Alternatively, we can store model files in their separate directory and write its different path when calling one of them.

Before we can use either of these three versions of KataGo, we need three to five further preparation steps:

1) For KataGo TensorRT, get another software containing libraries.

2) For KataGo CUDA or for KataGo TensorRT, copy the missing library files.

3) Initial benchmark of KataGo.

4) Initial tuning of KataGo.

5) In a GUI, set the command line for calling KataGo.

The following describes this procedure for each version of KataGo.

KataGo OpenCL

The installation directory, say C:\katago_OpenCL, already contains a copy of the needed OpenCL.dll library file. Therefore, we continue with step 3 of the procedure.

Open the Windows command line, that is C:\Windows\System32\cmd.exe . Go to the right directory using the command
cd \katago_OpenCL
There, execute the following command (or adjust the file name if you have chosen a different one):
katago.exe benchmark -model b18.bin.gz
katago calls the program katago.exe in the current directory. The parameter benchmark does not carry a minus sign because it does not call any object. The parameter -model carries a minus sign because it calls an object: our model file. Therefore, KataGo can benchmark for the model file that will be used later when we will use KataGo. Execution of the command takes a while. Eventually, KataGo creates the subdirectories and file \gtp_logs, \KataGoData and \KataGoData\opencltuning\tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt or a similar file name.

As step 4 of the procedure, we are still in the same directory and execute the following command:
katago.exe genconfig -model b18.bin.gz -output gtp_custom.cfg
The parameter genconfig calls the tuning function for our model file b18.bin.gz and will eventually create the configuration file gtp_custom.cfg in the same directory. First, the tuning interacts with us, as we already know. When asked, we must specify the right device. Now, however, for another question, we must also choose a useful number of visits. On a modern graphics card, this might be:
10000
When the tuning starts, we notice whether it proceeds smoothly or is way too slow. If necessary, we can interrupt execution by CTRL C and execute the command afresh with then a much smaller number of visits. Otherwise, we are patient and let the tuning do its job. It writes the appropriate values in the created configuration file. We close the command line window.

As step 5 of the procedure, we start Lizzie or KaTrain to set the command line for calling KataGo. If we use Lizzie, we go to Settings | Engine | Engine 0, delete the earlier command line and enter this command line:
C:\katago_OpenCL\katago.exe gtp -model C:\katago_OpenCL\b18.bin.gz -config C:\katago_OpenCL\gtp_custom.cfg
For more variation on Lizzie's syntax of the command line, see here. Lizzie is one of the GUIs communicating to KataGo in the gtp mode. Therefore the command has the gtp parameter and uses the gtp_custom.cfg configuration file. Optionally, alter Max Game Thinking Time. Click OK. Choosen Game | NewGame(N) and so on. You should be able to play against the AI. If necessary, close and restart Lizzie.

If we use KaTrain, in General & Engine Settings, we set this Override command line:
C:\katago_OpenCL\katago.exe analysis -model C:\katago_OpenCL\b18.bin.gz -config C:\katago_OpenCL\analysis_config.cfg
For more variation on KaTrain's syntax of the command line, see here. KaTrain is one of the GUIs communicating to KataGo in the analysis mode. Therefore the command has the analysis parameter and uses the analysis_config.cfg configuration file. Click Update Settings and ESC. Unless closing / killing the process and restarting KaTrain is necessary, you should be able to start a new game and play against the AI.

KataGo CUDA

Our installation directory is C:\katago_CUDA . We start at step 2 of the procedure. We copy from C:\baduk\lizzie to C:\katago_CUDA the following five files, of which KataGo CUDA uses all:
cublas64_11.dll
cublasLt64_11.dll
cudnn_cnn_infer64_8.dll
cudnn_ops_infer64_8.dll
cudnn64_8.dll
In the benchmark step 3 of the procedure, we open the Windows command line, change directory to C:\katago_CUDA and execute this command:
katago.exe benchmark -model b18.bin.gz
KataGo creates the subdirectory \gtp_logs .

As step 4 of the procedure, we are still in the same directory and execute the following command, during whose dialog we write the right CUDA Device number and afterwards set the number of visits to, for example, 10000:
katago.exe genconfig -model b18.bin.gz -output gtp_custom.cfg
KataGo creates the files C:\katago_CUDA\gtp_custom.cfg and, for example, C:\katago_CUDA\gtp_logs\20230609-072808-6A18D901.log .

In step 5 of the procedure, we tell Lizzie the Engine command line
C:\katago_CUDA\katago.exe gtp -model C:\katago_CUDA\b18.bin.gz -config C:\katago_CUDA\gtp_custom.cfg
or tell KaTrain the Override command line
C:\katago_CUDA\katago.exe analysis -model C:\katago_CUDA\b18.bin.gz -config C:\katago_CUDA\analysis_config.cfg

KataGo TensorRT

Our installation directory is C:\katago_TensorRT . It also needs files that are not readily available yet. We begin with step 1 of the procedure and download LizzieYZY as a separate software. The downloaded file is a ZIP archive, which we unpack in the Windows Explorer.

As step 2 of the procedure, the unpacked archive contains the subfolder \katago_tensorRT , from which we copy the following files to C:\katago_TensorRT :
cublas64_11.dll         
cublasLt64_11.dll      
cudart64_110.dll
cudnn_cnn_infer64_8.dll
cudnn_ops_infer64_8.dll      
cudnn64_8.dll         
msvcr110.dll
nvinfer.dll         
nvinfer_builder_resource.dll
nvrtc64_112_0.dll
nvrtc-builtins64_114.dll

As step 3 of the procedure, we open the Windows command line, change directory to C:\katago_TensorRT and execute this command:
katago.exe benchmark -model b18.bin.gz

As step 4 of the procedure, we are still in the same directory and execute the following command, during whose dialog we write the right GPU Device number and afterwards set the number of visits to, for example, 10000:
katago.exe genconfig -model b18.bin.gz -output gtp_custom.cfg

In step 5 of the procedure, we tell Lizzie the Engine command line
C:\katago_TensorRT\katago.exe gtp -model C:\katago_TensorRT\b18.bin.gz -config C:\katago_TensorRT\gtp_custom.cfg

or tell KaTrain the Override command line
C:\katago_TensorRT\katago.exe analysis -model C:\katago_TensorRT\b18.bin.gz -config C:\katago_TensorRT\analysis_config.cfg

On the first start, KataGo TensorRT needs two minutes or more in KaTrain or 30 seconds or more in Lizzie. At later starts, the delay is up to ~23 seconds on my computer. On recent computers, the delays may be worth it because usually KataGo TensorRT is the fastest version of KataGo during go move generation by far.

Nvidia Libraries

Preface

So far, we have created some duplicate files. Some of them are huge so much disk space is wasted. Furthermore, at least on my computer, KataGo CUDA has been slow so far and one of the possible reasons is a too old library file. Instead of manually copying individual library files, the usual but even more complicated way seeks them from Nvidia's webpage, where first one must register. We need local executables for Windows 11 or at least Windows 10 of the right versions. If you see GA and EA variants of a version, GA seems to be the revision. For a version, Nvidia often offers several subversions. It is possible that Nvidia's installers also mess with drivers or install developer softwares, which we players do not need. We might find installed libraries and copy them or refer to them by a Windows PATH environment variable. If we look at the individual library files above, we notice some numbers in the file names, which might denote version numbers.

We are not done yet. Further tuning of each version and additional care for the analysis variant are needed. We can run the genconfig tuning several times with different numbers, such as 5000, 10000, 20000, 30000, of visits, save the created config files under different file names, and compare or modify the values in these config files. We might also let analyse board positions and compare numbers of visits to judge about different config parameters.

Download

Download cuda_11.6.2_511.65_windows.exe (CUDA 11.6.2).

Download cudnn-windows-x86_64-8.9.1.23_cuda11-archive.zip (CuDNN 8.9.1 for CUDA 11).

Download TensorRT-8.5.2.2.Windows10.x86_64.cuda-11.8.cudnn8.6.zip (TensorRT-8.5.2.2 for CUDA 11). As an alternative for the latter, download 2023-06-15-windows64+katago.zip (LizzieYZY_2_5_3).

If necessary, locate links to archived download files.

These Nvidia download file versions work for KataGo CUDA 1_13_0 and KataGo TensorRT 1_13_1 on my computer. Another user has reported that TensorRT 8.5.3.1 works for him. The KataGo download file names give hints on Nvidia download file versions but, currently on 2023-06-15, the only safe advice is use of files for the main version CUDA 11 (not 12) for Windows 10 or 11 (if 11 is not offered, choose Windows 10 files) as local EXE. For some downloads, you may need to register at Nvidia's webpage, answer a query (Why is every enduser an organisation?!) and receive confirmation emails.

Fate

Installed download files might, or might not, work. This depends on hardware, the Windows and programs installation, the Nvidia graphics card driver version, the Nvidia CUDA library download file version, the Nvidia CuDNN library download file version, the Nvidia TensorRT library download file version, the KataGO CUDA download file version and the KataGo TensorRT download file version. Trial and error may be needed. If an installation of downloads fails, uninstall and try a different installation. The concept of libraries is modularity but, in practice, it is limited. Downloading files with close release dates has a greater chance of success. Choose a CuDNN version for a CUDA version. Choose a TensorRT version for a CUDA version and, so only the theory, for a CuDNN version. In particular, finding a working TensorRT version can be difficult. You might start with the newest subversion and, if necessary, try subseqent subversions one after another. If this fails, also try some sub-subversions. Nvidia provides version compatibility information but such is flawed. Keep your motivation because TensorRT can be significantly faster than OpenCL or CUDA!

Even if you establish some working installation, it can still be very wrong by resulting in slow speed (up to 1/6 of what it should be) of KataGo CUDA or KataGo TensorRT. Without reference to earlier speeds, you might not know whether it is slow or fast. However, CUDA libraries might (but need not) be faster than OpenCL, and TensorRT libraries should be the fastest. If the relative order is obviously wrong or some benchmarks or gtpconfig runs last forever, you know that some KataGo library version must run too slowly. Most likely, it is not KataGo's or your graphic card's fault but is the fault of an improper combination of Nvidia download files. In that case, trial and error continue. I have experienced it all. Installation is already very difficult but this trial and error process can make it even much more difficult. At least, now you know what to look for if you follow this manual and things go wrong nevertheless.

Preparation and General

Do not have a) any other versions of Nvidia CUDA, CuDNN or TensorRT installed or b) any such additional files copied to C:\katago_CUDA or C:\katago_TensorRT.

Create: C:\Program Files\CUDA

Install CUDA and CuDNN before TensorRT. We also put all CuDNN and TensorRT binaries there so that we only need to reference one path in the Windows system's Path environment variables. Alternative, more complicated methods are possible.

CUDA and CuDNN Installation

CUDA Installer

Start cuda_11.6.2_511.65_windows.exe as administrator.

Confirm a temporary file path.

Choose Custom installation.

Not selected if not needed or already installed: Driver components | Nvidia Display Driver, Other components | Nvidia PhysX.

Only select CUDA | Runtime | Libraries <all>.

For at least two graphics cards or additionally desired software, selecting more can be necessary. Then, choosing other installation paths and paths in environment variables might also be necessary below.

Instead of the installation path for CUDA Development, replace C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\11.2 and set the more convenient path: C:\Program Files\CUDA

Paths

In Windows start menu, search environment variables (German: Umgebungsvariablen), go there and verify that the installer has created these Windows system-wide (not: for the current user) environment variables (or with a different version number):
CUDA_PATH           C:\Program Files\CUDA
CUDA_PATH_V11_6     C:\Program Files\CUDA
Path                C:\Program Files\CUDA\bin
If one of the first two is missing, use New to add the missing item, if necessary, fitting your CUDA version.

If you cannot see the third in Path, double-click on the Path row and check if it is missing.

Path contains other paths, such as %SystemRoot%\system32 . Do not accidentally delete prior entries.

If the third item is missing in Path, use New to add it.

Click OK thrice.

Restart Windows.

CuDNN Installation

In a temporary directory, extract: cudnn-windows-x86_64-8.9.1.23_cuda11-archive.zip

Move all from \bin to C:\Program Files\CUDA\bin

Move all from \include to C:\Program Files\CUDA\include

Move all from the differing source directory \lib to C:\Program Files\CUDA\lib\x64

Exceptionally Needed Installation

Nvidia's installer may have found some file, such as zlibwapi.dll, already installed on your computer and therefore not install it in the Path-referenced directory C:\Program Files\CUDA\bin. Locate and copy the file, for example, as follows:

Copy "C:\Program Files (x86)\ASUS\ArmouryDevice\dll\ArmourySocketServer\zlibwapi.dll" to C:\katago_CUDA

KataGo CUDA

Now use a GUI with KataGo CUDA.

TensorRT Installation

<Complete TensorRT Installation Variant>

In a temporary directory, extract: TensorRT-8.5.2.2.Windows10.x86_64.cuda-11.8.cudnn8.6.zip

Move all from \bin to C:\Program Files\CUDA\bin

Move all from \include to C:\Program Files\CUDA\include

Move all DLL files from the differing source directory \lib to C:\Program Files\CUDA\bin

Move all LIB files from the differing source directory \lib to C:\Program Files\CUDA\lib\x64

Move the other directories to C:\Program Files\CUDA

<Short TensorRT Installation Variant>

In a temporary directory, extract: TensorRT-8.5.2.2.Windows10.x86_64.cuda-11.8.cudnn8.6.zip

Copy \lib\nvinfer.dll and \lib\nvinfer_builder_resource.dll to C:\Program Files\CUDA\bin

<Alternative TensorRT Installation Variant>

In a temporary directory, extract 2023-06-15-windows64+katago.zip (LizzieYZY_2_5_3). In a directory for TensorRT, locate these exactly same two files:

Copy nvinfer.dll and nvinfer_builder_resource.dll to C:\Program Files\CUDA\bin

KataGo TensorRT

Now use a GUI with KataGo TensorRT.

Used Files

These are the typical Nvidia library files in C:\Program Files\CUDA\bin

Nvidia CUDA Files

The total size is 1,68 GB.
cublas64_11.dll   
cublasLt64_11.dll   
cudart32_110.dll   
cudart64_110.dll   
cufft64_10.dll   
cufftw64_10.dll   
curand64_10.dll   
cusolver64_11.dll   
cusolverMg64_11.dll   
cusparse64_11.dll   
nppc64_11.dll   
nppial64_11.dll   
nppicc64_11.dll   
nppidei64_11.dll   
nppif64_11.dll   
nppig64_11.dll   
nppim64_11.dll   
nppist64_11.dll   
nppisu64_11.dll   
nppitc64_11.dll   
npps64_11.dll   
nvblas64_11.dll   
nvjpeg64_11.dll   
nvrtc-builtins64_116.dll   
nvrtc64_112_0.dll

Nvidia CuDNN Files

The total size is 1.08 GB.
cudnn64_8.dll   
cudnn_adv_infer64_8.dll   
cudnn_adv_train64_8.dll   
cudnn_cnn_infer64_8.dll   
cudnn_cnn_train64_8.dll   
cudnn_ops_infer64_8.dll   
cudnn_ops_train64_8.dll   

Nvidia TensorRT Files

The total size is 0.85 GB.
nvinfer.dll   
nvinfer_builder_resource.dll   
nvinfer_plugin.dll
nvonnxparser.dll   
nvparsers.dll   
trtexec.exe

KataGo OpenCL

I have used Process Monitor to watch files used by KataGo. Note that multiple GPUs, server use etc. may require more files.

KataGo OpenCL uses libraries in its directory or, if OpenCL.dll is missing, that file from the system directory.

KataGo CUDA

Typically and besides system files, KataGo CUDA uses these files (or similarly named model, CFG, LOG files):
C:\katago_CUDA\b18.bin.gz
C:\katago_CUDA\gtp_custom_Nvidia_11.6.2_50000.cfg
C:\katago_CUDA\libcrypto-1_1-x64.dll
C:\katago_CUDA\libssl-1_1-x64.dll
C:\katago_CUDA\libz.dll
C:\katago_CUDA\libzip.dll
C:\katago_CUDA\msvcp140.dll
C:\katago_CUDA\vcruntime140.dll
C:\katago_CUDA\zlibwapi.dll
C:\katago_CUDA\gtp_logs\20230617-171446-07EFA493.log
C:\Program Files\CUDA\bin\cublas64_11.dll
C:\Program Files\CUDA\bin\cublasLt64_11.dll
C:\Program Files\CUDA\bin\cudnn_cnn_infer64_8.dll
C:\Program Files\CUDA\bin\cudnn_ops_infer64_8.dll
C:\Program Files\CUDA\bin\cudnn64_8.dll

KataGo TensorRT

Typically, KataGo TensorRT uses:
C:\katago_TensorRT\b18.bin.gz
C:\katago_TensorRT\gtp_custom.cfg
C:\katago_TensorRT\KataGoData\trtcache\trt-8502_gpu-e00748cc_tune-e98f11832326_exact19x19_batch32_fp16
C:\katago_TensorRT\KataGoData\trtcache\trt-8502_gpu-e00748cc_tune-e98f11832326_exact19x19_batch96_fp16
C:\katago_TensorRT\libcrypto-1_1-x64.dll
C:\katago_TensorRT\libssl-1_1-x64.dll
C:\katago_TensorRT\libz.dll
C:\katago_TensorRT\libzip.dll
C:\katago_TensorRT\msvcp140.dll
C:\katago_TensorRT\vcruntime140.dll
C:\katago_TensorRT\gtp_logs\20230617-120124-AF10C399.log
C:\Program Files\CUDA\bin\cublas64_11.dll
C:\Program Files\CUDA\bin\cublasLt64_11.dll
C:\Program Files\CUDA\bin\cudnn_ops_infer64_8.dll
C:\Program Files\CUDA\bin\cudnn64_8.dll
C:\Program Files\CUDA\bin\nvinfer.dll
C:\Program Files\CUDA\bin\nvinfer_builder_resource.dll

Tuning

Introduction

A 'playout' is an emulated game sequence. 'Visits' is the number of playouts of the current turn plus the number of still applicable playouts of previous turns. Speed is measured as visits per second. Presumably, 'threads' are simultaneously explored variations.

Every version of KataGo needs tuning for given model net, typical thinking time in seconds per move and, less relevant, a cache size in GB. By approximately maximising visits/s, we determine the KataGo version's optimal number of threads. Below, I describe major tuning. Fine tuning might approximate more closely.

I use RTX 4070 (Asus TUF 12G, Quiet mode, 200W TDP, 100% power target) + Ryzen 7700 (8C, 16T) + 64 GB DDR5-RAM JEDEC, 18-Block-Model = kata1-b18c384nbt-s6386600960-d3368371862 and genconfig (unless benchmark). The recommended values are shown but ++ denotes when they are not the highest. - is default GB and time.

KataGo 1_13_0 OpenCL

visits  threads  visits/s    GB    s     remarks

   800     20    1184.02+     -    -     benchmark

10000     24    1672.61++    -    -     benchmark

10000     40    1793.91+     -    -     \System32\OpenCL.dll

10000     48    1874.83     30    -

10000     40    1808.78+     -    -

50000     24    1964.08      -    -

100000     40    2203.24+ ### -    -

KataGo 1_13_0 CUDA

CUDA + CuDNN of Megapack
visits  threads  visits/s    GB    s     remarks

   800     40     450.17+     -    -

   800     48     450.66+     -    -     benchmark

  2000     32     497.30+     -    -

10000     40     683.34+     -    -

10000     48     743.29+     -    -

10000     48     752.09+     -    -     benchmark

CUDA_11_6_2 + CuDNN_8_9_1_23
visits  threads  visits/s    GB    s     

10000     80    3184.53      -    -     benchmark

10000     80    3334.43      -    -

50000     80    3832.44      -    -

100000     64    3983.82 ###  -    -

KataGo 1_13_1 TensorRT

CUDA + CuDNN of Megapack + TensorRT_8_5_2_2
visits  threads  visits/s    GB    s     

   800     40    2879.17+     -    -

10000     80    5161.87+     -    -

10000     64    4662.43      -    -

10000     40    4322.18++   64    1

20000     64    4905.87+     -    -

30000     64    5119.17+     -    -

CUDA_11_6_2 + CuDNN_8_9_1_23 + TensorRT_8_5_2_2
visits  threads  visits/s    GB    s     

10000     40    4473.86+     -    -

10000     80    4603.14      -    -

30000     64    5077.34+     -    -

40000     80    5431.83      -    -

50000     64    5299.24      -    -

60000     80    5496.85      -    -

80000     64    5823.38      -    -

100000     96    6321.13      -    -

120000     96    6443.15      -    -

140000     80    6494.54 ###  -    -

160000     64    6244.78      -    -

Rather Optimised Speeds for Visits as Only Changed Parameter

visits/s     KataGo

2203.24+     OpenCL

3983.82      CUDA

6494.54      TensorRT

Factors Comparing Different Speeds

14.43     Worst default installation versus best KataGo and Nvidia library installation with rather optimised visits

5.82     RTX 4090 versus RTX 3050 (2560x1440 Time Spy Graphics)

5.30     Different combinations of Nvidia file versions (worst case for rather optimised visits)

2.95     TensorRT : OpenCL   (rather optimised visits)

2.17     Default visits versus rather optimised visits as only changed parameter (worst case)

2.07     RTX 4090 versus RTX 4070 (2560x1440 Time Spy Graphics)

1.81     CUDA : OpenCL       (rather optimised visits)

1.63     TensorRT : CUDA     (rather optimised visits)

Tuning Revisited

Let me also describe major tuning in words. Choose one of the strongest model nets and a KataGo version, then tune for this combination. Each variant of KataGo (OpenCL, CUDA or TensorRT) needs its own tuning, or simply go for TensorRT as the fastest variant on modern graphics cards (except for GUI launches).

For KataGo benchmark, use the -v parameter to specify visits, such as -v 10000 for that many visits. For each execution of KataGo genconfig, also specify the visits when asked. Increase in large steps to locate the order of magnitude where visits/s (visits per second) are maximal. Save or write down the recommended number of threads, or eventually use the most appropriate CFG file for the currently tuned KataGo variant. (And fine tune it with other parameters.)

Do not listen to naysayers denying the value of deviating from defaults, tuning, installing TensorRT and finding good Nvidia library versions! Good installation combined with good tuning can result in a speed improvement up to almost three times as large as the speed difference between RTX 3050 and 4090. The latter is comparable to replacing bad files distributed in a GUI installer to good files selected well from Nvidia's webpage. TensorRT might be thrice as fast as OpenCL. Just tuning the number of threads parameter amounts to a speed factor similar to replacing an RTX 4070 by a 4090.

In conclusion, tuning is much more relevant than replacing a comparatively slow by the fastest graphics cards! Spend a couple of days but do it!

GUI softwares using an 'analysis' command line might deserve their own tuning.

Comparing Speeds of Different Hardwares

Speed

Except for too slow, old hardware, I have dug in the archives and found some numbers of visits/s or playouts in comparison to mine (all rounded):
Speed   Hardware

6500    RTX 4070 TensorRT

4000    RTX 4070 CUDA

3000    2 * RTX 2080TI [1]

2200    RTX 4070 OpenCL

0580    5700XT [2]

0300    iPad_Pro/M1 [3]

0200    iPhone 13 pro [4]

0170    iPad/A12X [5]

I do not know yet where RTX 1000, RTX 3000, other RTX 4000, RTX Laptop cards and Macs fit. Please tell us your measured speeds!

[1] goame CUDA b40 2*2080TI 64GB 100000 visits 1s, 40 threads (recommended) = 2832.08 visits/s, 80 threads = 3019.80 visits/s

[2] dojo_b b40 5700XT 12GB, 16 threads = 583.43 visits/s

[3] Limeztone: For an arbitrary mid game position (b40s985 net):
iPad_Pro/A12X: 14.37 playouts/s
iPad_Pro/M1: 297.63 playouts/s

[4] wineandgolover (see also here) b40 iPhone 13 pro, nearly 200 visits/s

[5] y_ich (see also here) iPad/A12X, 170 playouts/s; the AI needs at least a few hundreds playouts to read simple ladders

Efficiency

According to HWiNFO64, my RTX 4070 at 100% power target consumes between 150 and 210W when running KataGo. Typically close to 200W but some operations and KataGo CUDA are a bit more modest. 90% instead of 96% GPU load has a great impact on whether it is closer to 150W or 200W. I guess that some 70% power target via Afterburner would result in consistently around 150W use. Such may be more important on notebooks. Let me assume 200W as representative on my desktop and use TensorRT. Add 65W for the APU (even if the iGPU is idle, it consumes much, like 35W CPU and 30W iGPU; desktop Ryzens are not efficient but only roughly keep their TDP). I ignore peanuts for other mainboard components. Then we have roughly these efficiencies as visits per second per watt:
visits/s/W   Hardware

24.5         200W-RTX 4070 + 65W-APU

05.0         40W-iPad M1
Hence, M1 consumes comparatively little even under full load but an RTX 4000 desktop with moderate APU is roughly 5 times as efficient while consuming 6.6 times as much power. I think RTX 4000 Laptop GPUs are, and especially can be set to be, even more power efficient. Modern dGPU-Chips are both power-hungry and, at that level, efficient. Of course, mobile devices have their good uses, too. It is just that one should not expect speed wonders from small form factors with necessarily limited TDPs.

Windows Standard User

Introduction

As mentioned earlier, I started using the programs as Windows administrator. Now that I know how to let them run, usage moves to a Windows standard user. This introduces a few extra hurdles but it is fairly easy to overcome them. I describe things presuming the earlier installation of Baduk Megapack. For individual installation of the GUIs, such as LizzieYZY, things should be similar.

KaTrain

KaTrain also wants write access as the Windows standard user to these folders:
C:\Users\<user_name>\.katrain
C:\Users\<user_name>\.kivy
C:\baduk
By default, such access rights are granted. Therefore, KaTrain can just be used.

Lizzie

Lizzie is a bit trickier. While Megapack created the desktop icon for the administrator and set some file's contents accordingly including update options, we must create a new desktop icon for the Windows standard user and there is no easy update option for him, which I do not need but your usage might differ.

In Explorer, go to the directory C:\baduk\lizzie, right-click on lizzie.ico and create the desktop icon. Right-click on this desktop icon. In Target write:
C:\baduk\LizzieYZY\jre\java11\bin\javaw.exe -jar C:\baduk\lizzie\lizzie.jar
(This is similar to setting up a desktop icon for earlier CGoban versions, which came as a jar file and also used JavaRuntimeEnvironment.)

Now, you can use Lizzie as expected.

Security

For each of the folders, in which the GUIs or KataGo want write access (unless you always use different paths for logs and configuration files),
C:\baduk
C:\katago_CUDA
C:\katago_OpenCL
C:\katago_TensorRT

C:\LizzieYZY
you might deny access by your possibly different Windows standard user, which exists for online access. Furthermore, you might supervise these folders for software execution. No write access by online users but execution right of these go folders by other users establish safety.

Even without these additional steps, it is good practice to perform everyday usage (such as using go programs) as a Windows standard user to restrict the scope of any harm by attacks on the computer. Needless to say, detailed information on a possible Windows security concept is on my webpage.

GUI Softwares

This chapter describes how to install other GUI softwares and run KataGo with them. For Lizzie and KaTrain, see further above. Note that some GUI softwares call KataGo by a 'gtp' (go transfer protocol) command line but others call it by an 'analysis' command line. Some softwares also enable our contribution to KataGo training.

CGoban

If you want to install CGoban in C:\Program Files or a standard user's user directory, start the command line with adminstrative rights so the MSI installer inherits them. Alternatively, hold down the Shift key while right clicking on the MSI installer, click 'Show more options' and 'Run as different user'.

GoWrite

This describes GoWrite 2_3_2_4. GoWrite has its own idea of the contents of a CFG file so create and edit analysis_config_gowrite.cfg as follows:
Options | General Settings | Engine
x Use katago
Set the following sample paths:
Katago path = C:\katago\katago.exe
Analysis configuration = C:\katago\analysis_config_gowrite.cfg
Network = C:\katago\b18.bin.gz
Click Test, no errors should occur but Testing engine... possibly followed by a live log
Click Stop
Click OK

Load a game.
Click Ai button to let KataGo analyse all positions of the game.
Study the positions or edit the game while KataGo may do more analysis.

I think that plain playing against KataGo is impossible but you might choose one of its best moves when editing a position.

Larger numAnalysisThreads values are possible. After changing the value, Options | General Settings | Engine | Test is necessary again. I have not tested yet whether Analysis configuration can contain more parameters.

LizGoban

I have tried LizGoban KataGo Eigen briefly with CPU load 43~44%, CPU fans >1300RPM, System fans >1500RPM so subjectively especially too loud CPU fans. Luckily, we can use KataGo on the GPU. For this purpose, one should modify config.json. LizGoban in Baduk Megapack does not enable a modified config file. Therefore, from LizGoban on github and a webpage's Assets section, download LizGoban-<version>_win_<date>.zip and extract it to C:\Program Files (x86)\LizGoban . The program starts as a 32b process but launches 64b child processes executed in C:\Users\<user_name>\AppData\Local\Temp . It works for a standard Windows user and has also worked if installed to C:\Program Files\LizGoban . Use or suitably modify the following file between the BEGIN and END lines and save it as C:\Program Files (x86)\LizGoban\config.json

************************ BEGIN config.json ************************
{
    "max_cached_engines": 3,
    "face_image_rule": [
        [-0.8, "goisi_k4.png", "goisi_s4.png"],
        [-0.4, "goisi_k8.png", "goisi_s8.png"],
        [0.00, "goisi_k7.png", "goisi_s7.png"],
        [0.30, "goisi_k11.png", "goisi_s11.png"],
        [0.90, "goisi_k10.png", "goisi_s10.png"],
        [1.00, "goisi_k16.png", "goisi_s16.png"]
    ],
    "face_image_diff_rule": [
        [-1.0, "goisi_k15.png", "goisi_s15.png"],
        [-0.5, "goisi_k9.png", "goisi_s9.png"],
        [0.50, null, null],
        [1.00, "goisi_k5.png", "goisi_s5.png"],
        [2.00, "goisi_k14.png", "goisi_s14.png"]
    ],
    "preset": [
        {
            "label": "Katago_TensorRT",
            "accelerator": "F1",
            "engine": ["C:/katago_TensorRT/katago",
                       "gtp",
                       "-override-config", "analysisPVLen=50, defaultBoardSize=19",
                       "-model", "C:/katago_TensorRT/b18.bin.gz",
                       "-config", "C:/katago_TensorRT/gtp_custom.cfg"]
        },
        {
            "label": "Katago_CUDA",
            "accelerator": "F2",
            "engine": ["C:/katago_CUDA/katago",
                       "gtp",
                       "-override-config", "analysisPVLen=50, defaultBoardSize=19",
                       "-model", "C:/katago_CUDA/b18.bin.gz",
                       "-config", "C:/katago_CUDA/gtp_custom.cfg"]
        },
        {
            "label": "Katago_OpenCL",
            "accelerator": "F3",
            "engine": ["C:/katago_OpenCL/katago",
                       "gtp",
                       "-override-config", "analysisPVLen=50, defaultBoardSize=19",
                       "-model", "C:/katago_OpenCL/b18.bin.gz",
                       "-config", "C:/katago_OpenCL/gtp_custom.cfg"]
        }
    ]
}
************************ END config.json ************************

In the file, note that the paths use slashs because I got errors with backslashs but maybe the errors were unrelated and you might try backslashs nevertheless. Create a desktop link to LizGoban<version>.exe and start LizGoban. This may create initial problems such as parsing errors. You might need to close LizGoban or, if necessary, kill its process trees, restart it and click Try Again or a similar button on a remaining small dialog window, which you might have to move out of the center of your display so that it is not hidden by the "Starting LizGoban..." message. Repeat until you succeed. If never, you might actually have made a syntax error in your edited file.

The specified accelerators F1, F2, F3 are your shortcut keys, with which you can change the KataGo engine faster than via the Preset menu. The started LizGoban loads the F1 engine.

File | Match vs. AI starts a game against KataGo. To play White, then click on "start AI's turn". During the game, you can change the used engine easily.

LizGoban stores some files in C:\Users\<user_name>\AppData\Roaming\LizGoban .

LizzieYZY

This describes LizzieYZY 2_5_3.

Extract ZIP, copy to C:\LizzieYZY

Copying to C:\Program Files\LizzieYZY fails because write access is needed. Optionally, restrict user access rights of C:\LizzieYZY

Comes with 32-bit JRE version 17.

Desktop link to C:\LizzieYZY\Lizzieyzy-2.5.3-win64.exe

Initial setup: choose default or set values with error messages to 0.

Settings | Engines
Click Add
Name = OpenCL
Command = C:\katago_OpenCL\katago.exe gtp -model C:\katago_OpenCL\b18.bin.gz -config  C:\katago_OpenCL\gtp_custom.cfg
Click Save
Click Add
Name = OpenCUDA
Command = C:\katago_CUDA\katago.exe gtp -model C:\katago_CUDA\b18.bin.gz -config C:\katago_CUDA\gtp_custom.cfg
Click Save
Click Add
Name = TensorRT
Command = C:\katago_TensorRT\katago.exe gtp -model C:\katago_TensorRT\b18.bin.gz -config C:\katago_TensorRT\gtp_custom.cfg
Click Save
Click Exit

Press N
Choose Engine

Optionally, in the menu bar, click on the currently active engine to change it.

Ogatak

Ogatak is a 64b program with a simple, clear board GUI, an emphasis on analysis and the possibility to play against the AI. Extract the ZIP and copy to: C:\Program Files\Ogatak

Display in portrait position and full size window do not work properly.

Ogatak uses the directory C:\Users\<user_name>\AppData\Roaming\Ogatak

Ogatak can only manage one KataGo engine at a time. So set one of OpenCL, CUDA or TensorRT as follows:

KataGo OpenCL:
Setup | Locate KataGo...     C:\katago_OpenCL\katago.exe
Setup | Locate KataGo Locate analysis config...     C:\katago_OpenCL\analysis_config.cfg
Setup | Choose network...     C:\katago_OpenCL\b18.bin.gz
This lets Ogatak call:
C:\katago_OpenCL\katago.exe analysis -config C:\katago_OpenCL\analysis_config.cfg -model C:\katago_OpenCL\b18.bin.gz -quit-without-waiting

KataGo CUDA:
Setup | Locate KataGo...     C:\katago_CUDA\katago.exe
Setup | Locate KataGo Locate analysis config...     C:\katago_CUDA\analysis_config.cfg
Setup | Choose network...     C:\katago_CUDA\b18.bin.gz
This lets Ogatak call:
C:\katago_CUDA\katago.exe analysis -config C:\katago_CUDA\analysis_config.cfg -model C:\katago_CUDA\b18.bin.gz -quit-without-waiting

KataGo TensorRT:
Setup | Locate KataGo...     C:\katago_TensorRT\katago.exe
Setup | Locate KataGo Locate analysis config...     C:\katago_TensorRT\analysis_config.cfg
Setup | Choose network...     C:\katago_TensorRT\b18.bin.gz
This lets Ogatak call:
C:\katago_TensorRT\katago.exe analysis -config C:\katago_TensorRT\analysis_config.cfg -model C:\katago_TensorRT\b18.bin.gz -quit-without-waiting

Press Space to start / stop analysis.

F11 for engine self-play.

To play against the engine, set Misc | Engine plays Black or Misc | Engine plays White. Stop play by Space. Stop playing mode by Misc | Halt.

To start a new game, press CTRL N.

Of course, you might use various analysis tools and options.

q5go

Installation of q5go: extract ZIP archive, copy to C:\Program Files\q5go as it is a 64b program.

q5go writes to C:\Users\<user_name>\AppData\Local\q5go\q5gorc

When setting up q5go for the first time for different Windows users and has been configured as below for one Windows user, \q5go can simply be copied to the same C:\Users\<user_name>\AppData\Local subdirectory of a different <user_name>.

In the main window, select Settings | Preferences | Computer Go | New... for each KataGo version and set:

KataGo OpenCL
Name:   katago_OpenCL
Executable:   C:\katago_OpenCL\katago.exe
Arguments:   gtp -model b18.bin.gz -config gtp_custom.cfg

KataGo CUDA
Name:   katago_CUDA
Executable:   C:\katago_CUDA\katago.exe
Arguments:   gtp -model b18.bin.gz -config gtp_custom.cfg

KataGo TensorRT
Name:   katago_TensorRT
Executable:   C:\katago_TensorRT\katago.exe
Arguments:   gtp -model b18.bin.gz -config gtp_custom.cfg

Optionally activate: Use for analysis

Click OK as necessary


Analysis | Play against engine from current position...

Enter human player name and select engine, select engine colour etc., click OK.

Sabaki

Sabaki uses a path: C:\Users\<user_name>\AppData\Roaming\Sabaki

Set your options: Engines | Manage Engines... | General

Add engines: Engines | Manage Engines... | Engines

Set a logging path, such as: C:\Users\<user_name>\AppData\Roaming\Sabaki\logs

Sabaki run with the installation Windows administrator account shows some preinstalled engines.

Sabaki run with a Windows standard user account initially shows an empty engines list.

Add
Name = katago_OpenCL
Path = C:\katago_OpenCL\katago.exe
Arguments = gtp -model b18.bin.gz -config gtp_custom.cfg
Initial commands =
Add
Name = katago_CUDA
Path = C:\katago_CUDA\katago.exe
Arguments = gtp -model b18.bin.gz -config gtp_custom.cfg
Initial commands =
Add
Name = katago_TensorRT
Path = C:\katago_TensorRT\katago.exe
Arguments = gtp -model b18.bin.gz -config gtp_custom.cfg
Initial commands =

Optionally set, for example, Initial commands = time_settings 0 10 1;

Prepare playing by attaching players or engines: Engines | Attach...; Enter human player name or use Down-arrow left of black player name / Down-arrow right of white player name: select engine from drop-down list; Press OK.

Play: F5 to start playing. ESC to stop playing. Players / engines can be changed during the game.

Sabaki52

Sabaki52 is for self-play of a black versus a white engine. So far, I have tested Sabaki and Sabaki52 of Baduk Megapack. Once engines are set in Sabaki, they can also be used in Sabaki52.

Engines | Show Engines Sidebar

Click on the circled arrow, select an engine for both players. Optionally, click on the circled arrow again to set the white engine. Click on the lightning symbol or press F5 to start / stop engine versus engine play. Mark an engine, right-click, Detach to remove it from current play.

If the current list contains 1 engine, it is used for both players. If the current list contains 2 engines, both are used for the two players. If the current list contains 3 engines, only the first is used.

Syntax

Through trial and error, I have partially reverse-engineered the syntax of Lizzie and KaTrain as examples of GUI softwares using the 'gtp' or 'analysis' modes, respectively.

Lizzie

Lizzie is an example GUI software using the 'gtp' mode.

General Remarks

The syntax applies to Windows and KataGo OpenCL, CUDA and TensorRT. After setting the Lizzie Engine command line, it is sometimes necessary to close and restart Lizzie. Some names of model files look like <model_name>.gz instead of <model_name>.bin.gz

Basic Syntax
<path>\<katago_file_name>.exe gtp -model <path>\<model_name>.bin.gz -config <path>\<gtp_file_name>.cfg

Example
C:\katago\katago.exe gtp -model C:\katago\b18.bin.gz -config C:\katago\gtp_custom.cfg

Syntax with or without Blank(s)
"<path>\<katago_file_name>.exe" gtp -model "<path>\<model_name>.bin.gz" -config "<path>\<gtp_file_name>.cfg"

Examples
"C:\katago\katago.exe" gtp -model "C:\katago\b18.bin.gz" -config "C:\katago\gtp_custom.cfg"
"C:\kata go\katago.exe" gtp -model "C:\kata go\b18.bin.gz" -config "C:\kata go\gtp_custom.cfg"
"C:\katago\kata go.exe" gtp -model "C:\katago\b 18.bin.gz" -config "C:\katago\gtp custom.cfg"
"C:\kata go\kata go.exe" gtp -model "C:\kata go\b 18.bin.gz" -config "C:\kata go\gtp custom.cfg"

KaTrain

KaTrain is an example GUI software using the 'analysis' mode set in KaTrain's General & Engine Settings Override command line.

KaTrain for Windows and KataGo OpenCL General Remarks

analysis_config.cfg must exist with this or a different file name.

Otherwise, copy <path>\analysis_example.cfg to <path>\analysis_config.cfg

Tuning will be different from tuning KataGo for a gtp GUI and gtp_custom.cfg

After writing the KaTrain Override command line, press Update Settings.
Then, in particular, one of the following can happen:
- KataGo engine is ready. Playing is possible with GPU load 96~97%.
- KaTrain freezes. Kill its process. Restart KaTrain.
- KaTrain neither freezes nor applies the settings yet. Close KaTrain. Restart KaTrain.

Some names of model files look like <model_name>.gz instead of <model_name>.bin.gz

Presumably, an optional parameter is: -override-config homeDataDir=C:\Users\<username>/.katrain

KaTrain for KataGo OpenCL, Override Command Line

Syntax
<path>\<katago_file_name>.exe analysis -model <path>\<model_name>.bin.gz -config <path>\<analysis_file_name>.cfg

Example
C:\katago\katago.exe analysis -model C:\katago\b18.bin.gz -config C:\katago\analysis_config.cfg

KaTrain for KataGo OpenCL, Override Command Line, Paths or File Names with or without Blank(s)

Syntax
"<path>\<katago_file_name>.exe" analysis -model "<path>\<model_name>.bin.gz" -config "<path>\<analysis_file_name>.cfg"

Examples
"C:\katago\katago.exe" analysis -model "C:\katago\b18.bin.gz" -config "C:\katago\analysis_config.cfg"
"C:\kata go\katago.exe" analysis -model "C:\kata go\b18.bin.gz" -config "C:\kata go\analysis_config.cfg"
"C:\katago\kata go.exe" analysis -model "C:\katago\b 18.bin.gz" -config "C:\katago\analysis config.cfg"
"C:\kata go\kata go.exe" analysis -model "C:\kata go\b 18.bin.gz" -config "C:\kata go\analysis config.cfg"

KaTrain for KataGo OpenCL, Override Command Line with -analysis-threads Parameter

analysis_config.cfg may not contain numAnalysisThreads or its comment line starts with # .


Syntax
<path>\<katago_file_name>.exe analysis -model <path>\<model_name>.bin.gz -config <path>\<analysis_file_name>.cfg -analysis-threads <positive_integer>

Example
C:\katago\katago.exe analysis -model C:\katago\b18.bin.gz -config C:\katago\analysis_config.cfg -analysis-threads 2

Crapware

Introduction

Nowadays, crapware is one of the unpleasant characteristics of Windows computing.

I tried several softwares from the MSI mainboard downloads but they are all crapware, except for Afterburner. HWiNFO64 does very much more than all the crapware together, whose major purpose is permanent telemetry.

My Asus graphics card has been flooded with crapware in the Windows services and autoruns, of which I only need one tiny bit: deactivating the lighting. I do not want to pull its cable but manage it in configuration software.

However, I take care of the Asus crapware as follows. After deactivating Asus Windows services and autoruns as follows, the graphics card works well, as it should due to the Nvidia graphics card driver.

Different graphics cards from possibly other manufacturers might need different care so understand the following as a sample guideline.

Initial Setting

For the graphics card, deactivate lighting in Armoury Crate.

Deactivated Asus Windows Services

Stop and deactivate each of the following Windows services, then newstart Windows. The defaults are Automatic unless stated.
ArmouryCrateService "C:\Program Files\ASUS\ARMOURY CRATE Lite Service\ArmouryCrate.Service.exe"
ASUS Com Service "C:\Program Files (x86)\ASUS\AXSP\4.02.12\atkexComSvc.exe"
Asus Update-Dienst (asus) Automatic (Delayed Start) "C:\Program Files (x86)\ASUS\Update\AsusUpdate.exe" /svc
Asus Update-Dienst (asusm) Manual "C:\Program Files (x86)\ASUS\Update\AsusUpdate.exe" /medsvc
AsusCertService "C:\Program Files (x86)\ASUS\AsusCertService\AsusCertService.exe"
AsusROGLSLService Download ROGLSLoader "C:\Program Files (x86)\ASUS\AsusROGLSLService\AsusROGLSLService.exe" -runservice
GameSDK Service "C:\Program Files (x86)\ASUS\GameSDK Service\GameSDK.exe"
ROG Live Service "C:\Program Files\ASUS\ROG Live Service\ROGLiveService.exe"
ASUS AURA SYNC lighting service "C:\Program Files (x86)\LightingService\LightingService.exe"

Deactivated Asus Autoruns

ArmourySocketServer
ASUSUpdateTaskMachineCore<digits>
ASUSUpdateTaskMachineUA
P508PowerAgent_sdk
C:\Program Files (x86)\ASUS\ArmouryDevice\dll\ShareFromArmouryIII\Mouse\ROG STRIX CARRY\P508PowerAgent.exe
\ASUS\Framework Service C:\Program Files (x86)\ASUS\ArmouryDevice\asus_framework.exe
\ASUS\AcPowerNotification C:\Program Files (x86)\ASUS\ArmouryDevice\dll\AcPowerNotification\AcPowerNotification.exe
LightingService ASUS AURA SYNC lighting service C:\Program Files (x86)\LightingService\LightingService.exe

Deactivate the Light Again

Occasionally (once every few weeks), the graphics card light might reappear. To deactivate it again, do the following. For both of these Asus Windows services, -> Automatic -> Start -> Newstart -> Stop -> Deactivate -> Newstart.
ASUS AURA SYNC lighting service
ArmouryCrateService

In autoruns deactivate:
\ASUS\Framework Service