This repo holds my experiments on finetuning pre-trained Transformer-based architectures for Poetry generation. All of the experiments are done on Arabic Poetry.
I follow a simple approach for poetry generation. Conditioned on a verse, the model should generate the next verse. Then, this generated verse is used as an input to the model and so on. There are more complicated approaches that would take into account an extended left context, but I leave such approaches for later.
timestep | Model Input | Model output |
---|---|---|
1 | فيرجع الصدى | كأنه النشيج |
2 | كأنه النشيج | وهو المراد |
3 | وهو المراد | ... |
4 | .. | ... |
For starters, I pre-trained BERT on the Arabic Wikipedia. I used the source code here to train a monolingual Masked Language Model. I used the default training configuration.
-
BERT's original uses do not include language generation. Actually, its Masked Language Modelling objective makes it very difficult to sample from it. My idea is to condition a decoder model on the Contextual Embeddings generated by BERT for generation. I used two types of decoders:
-
- GRU decoder: A GRU-based decoder. The hidden states of the decoder are initialized with the embeddings output by BERT.
-
- GRU network with attention: In addition to initializing the hidden states of the decoder as before, Bahdanau attention on the contextual embeddings of BERT is used.
Finetuning is done through a Maximum Likelihood (MLE) objective to maximize the probability of generating the next verse. The gradients are back-propagated from the decoder to BERT.
-
TODO