FAQ === Performance ----------- - The training does not converge, the loss goes up or stays up instead of going down Please make sure to wait for a couple of epochs (~10). It is normal that strange things happen in the beginning. If that does not help, please reduce the learning rate by setting ``param.HyperParameter.opt_param.lr``. You may try it in steps of decreasing the learning rate by a factor of 0.5. - What is the deal with ``Mirror2D``? Please refer to our extensive note on this in the Fitting notebook which you get by following the instructions in the `Tutorial `__. Errors and Software Issues -------------------------- - I get errors when going through the example notebooks. This might be due to a mismatch between your locally installed DECODE version and the version of the jupyter notebooks. The default version of this documentation points to the latest DECODE release with the respective latest example notebooks. We advise to update DECODE. See the update section below the installation instructions. You may need to redownload the most recent notebooks as well. - I get ``CUDA out of memory`` errors This might happen if your GPU is 1. doing multiple things, i.e., used not only for computation but also for the display 2. old or has to little memory If you have multiple GPU devices you may set: ``device='cuda:1'`` (where ``1`` corresponds to the respective index of the device, starting with 0). If you don't have multiple devices, you should try to reduce the batch size: ``param.HyperParameter.batch_size``. - I get other CUDA errors, e.g., ``CUDA error: unspecified launch failure`` Please check whether you have a somewhat up to date CUDA driver. It's always a good idea to update it. Moreover, you can try to check which cudatoolkit version was installed by checking in the Terminal / Anaconda prompt ``conda list``. You can also try to pin the cudatoolkit version to another one by setting `cudatoolkit=10.1` instead of plain `cudatoolkit` without version in the installation of the decode environment. - I get errors like ``No CUDA capable device found`` or CUDA driver issues. This could mean that you really don't have a CUDA capable device (e.g. only an AMD GPU), or that there are driver issues. Please check the following .. code-block:: python import spline import torch print(torch.cuda.is_available()) print(spline.cuda_compiled) print(spline.cuda_is_available()) All above should return ``True``. When the first one returns ``False`` it is likely that you experience a CUDA driver issue. - Training breaks due to ``multiprocessing`` or ``broken pipe`` error. This can happen particularly often for Windows and there is no 'one answer'. You might want to decrease the number of CPU workers or disable multiprocessing at all. For this you would start the training with a changed number of workers by adding ``-w [number of workers]`` at the end of the python command. Specify ``-w 0`` for disabling multiprocessing if even 2 lead to an error. Alternatively change the ``.yaml`` file here: ``param -> Hardware -> num_worker_train``. Note that this can slow down training. You can also try changing the multiprocessing strategy, which you can do in the .yaml file: ``param -> Hardware -> torch_multiprocessing_sharing_strategy``. The sharing strategies depend on your system. Please have a look at `Pytorch Multiprocessing `__.