This folder contains the implementation of
- Trimmed action recognition using RNN
- Temporal action segmentation using Seq2seq model
Dataset:
- Task 1 & Task2: 4151 trimmed videos (each 5-20 secs in 24 fps with size 240x320)
- Task 3: 29 full-length videos (with size 240x320)
- 11 action labels
For details, refers to the PPT provided by TA.
Task 1 & Task2
- Videos to images: preprocessing.ipynb
- Images to features : train/CNN_features.ipynb and train/RNN_feature_extractor.ipynb
Task 3
- Images to features : train/RNN_feature_extractor.ipynb
Refers to the train folder.
# Extract CNN-based feature and conduct prediction using average-pooled features
bash hw5_p1.sh [directory of trimmed validation videos folder] [path of ground-truth csv file] [directory of output labels folder]
# Extract CNN-based feature and conduct prediction through RNN
bash hw5_p2.sh [directory of trimmed validation videos folder] [path of ground-truth csv file] [directory of output labels folder]
# Whole video length action recognition
bash hw5_p3.sh [directory of full-length validation videos folder] [directory of output labels folder]
Python3
pytorch==0.4
torchvision==0.2.1
skimage
matplotlib
skvideo
ffmpeg
- Performance in accuracy
CNN-based feautres | RNN-based feautres | Temporal action prediction | |
---|---|---|---|
Validation Acc. | 0.475 | 0.510 | 0.5779 |
- Visualization of CNN-based video features (left) and RNN-based video features (right).
- Visualization of Temporal action segmentation (OP06-R05-Cheeseburger).
- Color index and its corresponding gener:
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|
genres | Other | Inspect/Read | Open | Take | Cut | Put | Close | Move Around | Divide/Pull | Pour | Transfer |
[1] https://zhuanlan.zhihu.com/p/34418001
[2] https://github.com/thtang/ADLxMLDS2017/tree/master/hw1