Kaggle Projects References
Stands on giants’ shoulder.
Laboratory Earthquake Time Prediction
LANL Earthquake Prediction | Kaggle
Predict time-to-failure from acoustic signals recorded in double-direct-shear lab-scale earthquake experiments. Could be similar enough to Bing’s requirement.
One thing I notice is they prefer to extract statistical features from the time series, like skewness, kurtosis for model training instead of using raw data and deep learning. {A notebook for data exploration}, {Second place} with the contribution rank of their extracted features:
{First place notebook} use:
- a LSTM+Conv+Linear NN with extracted rolling stats features.
- Joint loss with MAE on
time to failure
,time since failure
, and BCE onfailure time<0.5s or not
. - Optimizer use
Nadam
withlr=0.008
.
They parallel the parameter tuning, too. Must learn how to parallel and brute force search optimal parameters!
Optimizers
{This post} summaries the main optimizer types are ['Adadelta', 'Adagrad', 'Adam', 'RMSprop', 'SGD']
The feature of each of them are:
AdamDelta
calculates the learning rate using cumulative second-order moment.
{StackOverflow} posit the learning rate
$$x_{t+1}=x_{t}+\Delta x_{t}$$
$$\Delta x_{t}=-\frac{\text{RMS}\left[\Delta x\right]_{t-1}}{\text{RMS}\left[g\right]_{t}}g_{t}$$
Where $$\text{RMS}\left[ \Delta x^2 \right] _ {t-1} = \sqrt{ E\left[ \Delta x^2\right]_{t-1}+\varepsilon}$$RMSprop
Use square root of mean- Adam
$$
g_t=\frac{\partial L}{\partial w_t};\ w_{t+1}=w_t-\widehat{m_t}\left(\frac{\alpha}{\sqrt{\widehat{v_t}}+\varepsilon}\right)\\
\widehat{m_t}=\frac{m_t}{1-\beta_1^t};\ \widehat{v_t}=\frac{v_t}{1-\beta_2^t}\\
m_t=\beta_1m_{t-1}+\left(1-\beta_1\right)g_t;\ v_t=\beta_2v_{t-1}+\left(1-\beta_2\right){g_t}^2\\
$$
Setting \(\beta_1=0\), Adam-> RMSProp.
Setting \(\beta_2=0\), Adam-> SGD+Momentum
\(\beta_1->0\), Adam-> AdaGrad
Theamsgrad
param allows Adam to use \(v_t = max(v_t, v_{t-1})\), which resemblesAdaMax
.
I am still confused the difference between AdaDelta and RMSprop, this blog may clarify it {Link}
Similarity Metrics
Start from {this twdDtSc post}
- Lp Norm:
Besides the popular L2 (Euclidean Distance) and L1 (MAE), Chebyshev distance \(L_{\inf}\) is even more robust to outliers than L1. which also takes and max distance along all candidate dimensions. - Pearson & Spearman correlation distance:
$$
\text{corr_dist} = 1- \frac{cov(A, B)}{\sqrt{var(A) \cdot var(B)}}
$$
Spearman correlation additionally consider ranked variables, which is a non-parametric measurement. Details see { biological explanation on type of variables}. Ideologically, ranked variables underestimate late - standardized Euclidean Distance
This is commonly used in preprocessing and standard normalization, with equation
$$
d_{P, Q} = \sqrt{\sum_{i=1}^n {(\frac{p_i – q_i}{\sigma_i})^2}}
$$ - Chi-square distance
In face recognition, it is used for histogram matching
$$
d(P, Q) = \sum{ \frac{(P_i – Q_i)^2}{P_i + Q_i} }
$$
Which differs from the \(\chi^2 = \sum{\frac{(O_i – E_i)^2}{E_i}}\) - Etc. Other similarity metrics includes Hanming, cosine similarity please refer to the {original post}.
Top 10 time series comp
List from {Medium}. May not fit my purpose, most of them are customer behavior oriented.
- The Rossman Sales Forecasting competition: https://www.kaggle.com/c/rossmann-store-sales
The M5 Forecasting competition: https://www.kaggle.com/c/m5-forecasting-accuracy
Top one use gradient boost tree-based algo: lightgbm, winner {github repo}{lightgbm_doc} - The Global Energy Forecasting Competition 2014: https://www.kaggle.com/c/global-energy-forecasting-competition-2014-load-forecasting
- The Santa Time Series Forecasting competition: https://www.kaggle.com/c/santa-time-series
- The Mercari Price Suggestion competition: https://www.kaggle.com/c/mercari-price-suggestion-challenge
- The Corporación Favorita Grocery Sales Forecasting competition: https://www.kaggle.com/c/favorita-grocery-sales-forecasting
- The GE Flight Quest II — Turbulence Prediction competition: https://www.kaggle.com/c/turbulence-forecasting-challenge-ii
- The Bike Sharing Demand competition: https://www.kaggle.com/c/bike-sharing-demand
- The Web Traffic Time Series Forecasting competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting
- The Energy Forecasting competition: https://www.kaggle.com/c/ashrae-energy-prediction