});

PyTorch Note

Coding Guides

systematic tutorial

The tutorial describe a integral workflow of transfer learning {Link}, experiment tracking, and finally model deploy.

It is a perfect example to follow with detailed source code and notebook for each chapter.

However, to boost the productivity, using new libraries like pytorch-lightning, ignite (integration, like keras for tensorflow), wandb and tensorboard (training monitoring and hyperparameter tuning.)

Related web page:

  1. {PyTorch-lightning v.s. Ignite}
    Start from PT-lightning which has quicker learning curve and better support in distributed learning.
  2. {PyTorchLightning-offiical Transfer leanring doc}
    Also have other cutting edge examples, such as meta-learning and DQN for RL.

plot model architecture with tensorboard

Install tensorboard, go to bash:

python operation:

Go back to bash run tensorboard, but make sure the port is available for it.

Otherwise, just try google colab’s tutorial to load tensorboard
{Colab_Tensorboard}
{PyTorch_Tensorboard}

A imitation of Keras:

check cuda availability

Output:

When allocating the cuda device, we can use

super()

usually seen in init function
will

torch.autograd

The original code uses torch.Autograd, but seems updated version uses torch.autograd
This zhihu post show some common application of Variable:
@ require_grad & volatile: exclude gradient, save computation resource
@ register_hook: attach a additional function to the Variable
@ profiler: Analyze the autograd and resource usage for Variables

release storage

CSDN: https://blog.csdn.net/jiangjiang_jian/article/details/79140742
@ gabarge collector: gc

apex

apex.amp {知乎介绍}

其中 amp: Automatic Mixed Precision
混合精度训“在内存中用FP16做储存和乘法加速计算,用FP32做累加避免舍入误差”。

opt_level = ‘O1’ 根据黑白名单自动选择储存精度
Dynamic Loss Scalling动态损失放大,以适应FP16较窄的值域

{Nvidia官方文档}

amp.scale_loss

It should consist of automatically scaling FP32 Weights to FP16 and scaling FP16 grad to FP32.
Explanation from 【Official Doc of Apex】

Usually use it with Context Manager**

graphs explanations

master Weights

loss scaling

Graphs from Baidu

mseed

A format from SEED (Standard for the Exchange of Earthquake Data)
Can be easily operated by matlab
【Read miniSeed Files-CNblogs】

obspy

opensource package for earthquake research
【CSDN】

HDF5

Hiearchical Data Format version 5
This is built for fast data operation in python numpy
A HDF file mainly has groups and datasets.
Like file folders and arrays.
It can also set metadata (e.g., attributes)
【Official Doc】
There is also an book by O’Relly
{Quick Start-CSDN}

When load data with f['X'], you only load a generator, you can modify the data directly, but you can’t read the dataset once you close the file. For instance:
If you solely want a copy of the dataset in hdf5:

You will get error: Not a dataset (not a dataset) if you do this:

Because the file has been closed outside with, the generator expired, too.

argparse

A Chinese introduction on 【Zhihu】
Here is part of the example:
@ ap.ArgumentParser(): A loader object
@ parser.add_argument()
@@’–%var_name%’
@@required: bool type, is this var compulsory
@@default:initial value for this var

Colab

definitely worth trying.
【20 productive Tricks for Colab】
Useful commands are:
@!bash
@tensorboard
@interactive tables for dataframe

but note that it will frequently re-mount google cloud disk.
【tips for long-term training with Colab】

Function Understanding

BCEWithLogitsLoss

BCE is binary cross entropy and with Logit just add a Sigmoid at the input X_n for numeric stability in training.

{This TowardDatascien Post} explains the math in BCELoss.
It says the CELoss is the KL divergence between the Cross Entropy and Entropy, namely:

$$ D{KL}(q||p)=H{p}(q)-H(q)=\sum{c=1}^{C}q(y{c})\cdot[log(q(y_c))-log(p(y_c))] $$

Where H_{p}(q) is cross entropy, H(q) is entropy, p(x) is the hypothetical prob dist, q(x) is the target prob dist.
The cross entropy in a binary scenario turns into the formulas below.

$$ BCELoss = 1/n\sum(y_n \cdot \ln{x_n} + (1-y_n) \cdot \ln{(1-x_n)}) $$

To make x_n varies from 0 ot 1, corresponding to target y_n=\lbrace0,1\rbrace. Add an sigmoid function on prediction x_n, we have

$$
BCEWithLogitsLoss = 1/n\sum(y_n \cdot \ln{[\sigma(x_n)]} + (1-y_n) \cdot \ln{[\sigma(1-x_n)]})
$$

where\sigma(x) = \frac{1}{1+exp(x)} is also know as sigmoid function, also used in logistic regression model (might be the reason why it is called logits here)

CrossEntropyLoss & LogSoftmax

The default input of CEL (Cross Entropy Loss) are raw probability of each classification and the index of maximum. One input the individual probability of each clas. See {Official Doc}. Here only discuss the default setting.

The equation for default (one hot) Cross Entropy Loss is

Notice there is a log, so the input should be original probability.
The function returns the mean CEL for each Minibatch by default.

One thing to mention is: The official doc says “The CEL is equivalent to combination of LogSoftmax and NLLLoss”. Where NLL means negative log likelihood.

According to StackOverflow , using LogSoftmax as the final layer and NLLLoss as loss function has equivalent effect as using Linear as final layer and CrossEntropyLoss as the loss function.

You can verify with following codes:

Softmax, sigmoid; CrossEntropy, BCELoss

I have been confused about softmax, sigmoid, CE and BCE usage for long.
The take-home summary is:

  • use softmax and CrossEntropy better for muliti-class classification.
  • BCELoss and sigmoid better for binary one.

Refer to {Link}, that:

The Cross Entropy loss function in fact has built in the softmax:

The math expression in program can be expressed as

What’s more the pytorch official doc says the Cross Entropy loss is default doing mean on loss, so if want to see the overall loss for entire train set, better use

Build large model with ModuleList, ModuleDict, Sequential

Ref: {Toward_datasci}

Takeaway: Sequential build-in the forward sequentially for the modules. ModuleList hold modules like a list, commonly used in UNet. ModuleDict hold modules like a dict, commonly used for alternatively modules for baseline comparison.

debugginng multi-threading forks

When seeing multi-threading bug or warning reports, try set the num_worker variable in Dataloader to 0 to disable multiprocess data feeding.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.