Sequence to sequence deep RNN

A network based on the the seq2seq tutorial by Tensorflow (https://www.tensorflow.org/tutorials/seq2seq)

Requirements

pandas
nltk (punkt tokenizer)
tensorflow >= 1.0.0

Dataset

The dataset used is in the "Question and Answer" format, meaning the jokes are for example:

Why do chicken coops only have two doors? Because if they had four, they would be chicken sedans!

where the input to en encoder is the first part of the joke, the "question" part, and the second part of the joke, the "punchline" is the answer part for the decoder!

Will push dataset once it is correctly cleaned! :)

The raw data can be found over at https://www.kaggle.com. There were two datasets used: One CSV formatted like "ID, Question, Answer", and another with just "ID, Joke". The preprocessing step will normalize the second datafile so that both have the same format.

https://www.kaggle.com/jiriroz/qa-jokes (called jokes.csv in the code)
https://www.kaggle.com/abhinavmoudgil95/short-jokes (called shortjokes.csv in the code)

The raw data is put into a raw_data folder

Bazel Tensorflow Serving

This repo also includes code for building the model together with Tensorflow Servings using Bazel.

It also includes a simple servings-client that can be used to call the server running the trained model to make yourself laugh

Docker

I included a dockerfile to easily run the pre-processing part!

build: docker build -t preprocess .

run: docker run -v ~/jokes:/usr/app/src preprocess

Using seq2seq lib:

install: git clone https://github.com/google/seq2seq.git cd seq2seq

pip3 install -e .

Test: python3 -m unittest seq2seq.test.pipeline_test

This will fail with ImportError: cannot import name 'bernoulli'. To fix:

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
data_small		data_small
flask		flask
ided_data		ided_data
preprocessing		preprocessing
servings_test		servings_test
.gitignore		.gitignore
BUILD		BUILD
Dockerfile		Dockerfile
README.md		README.md
configuration.py		configuration.py
convert_test.py		convert_test.py
reader.py		reader.py
requirements.txt		requirements.txt
seq2seq_model.py		seq2seq_model.py
seq_serving.py		seq_serving.py
test.py		test.py
translate.py		translate.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sequence to sequence deep RNN

Requirements

Dataset

Bazel Tensorflow Serving

Docker

Using seq2seq lib:

About

Releases

Packages

Contributors 2

Languages

Jacobh2/jokes

Folders and files

Latest commit

History

Repository files navigation

Sequence to sequence deep RNN

Requirements

Dataset

Bazel Tensorflow Serving

Docker

Using seq2seq lib:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages