Label smoothed Aggregation cross entropy loss for generalisation in sequence to sequence tasks.
This is useful for generalization in sequence to sequence tasks, helps lower the ECE loss.
For more information please refer too, (When Does Label Smoothing Help?)[https://arxiv.org/abs/1906.02629] (Aggregation Cross-Entropy for Sequence Recognition)[https://arxiv.org/abs/1904.08364]