Skip to content

Recognize the American Sign Language in a video stream and translate it to American word by word.

License

Notifications You must be signed in to change notification settings

simonefinelli/ASL-Real-time-Recognition

Repository files navigation

American Sing Language Real-time Recognition

The following project aims to demonstrate the feasibility of translating American Sign Language with a real-time approach.

The results obtained from the study of the problem are contained in the real-time demo application. Below, some GIFs are extracted from the webcam video stream.


In addition, a client-server web app has also been implemented. The following GIF shows how it works.

Getting Started

Follow the instructions below to get a clean installation.

Dataset

Download the WLASL dataset.

git clone https://github.com/dxli94/WLASL

Prerequisites

Create and activate a new virtual environment in the project folder.

~/project_folder$ virtualenv .env
~/project_folder$ source .env/bin/activate

Installation

  1. Clone the repo.
    (.env) git clone https://github.com/simonefinelli/ASL-Recognition-backup
  2. Install requirements.
    (.env) python -m pip install -r requirements.txt
  3. Split the WLASL dataset in the right format using the script in 'tools/dataset splitting/'.
    (.env) python k_gloss_splitting.py ./WLASL_full/ 2000
  4. Copy the pre-processed dataset in the 'data' folder.

Usage

Now let's see how to use the neural network, the demo and the web app.

Neural Net

  1. To start the training run:
    (.env) python train_model.py
  2. After training, to evaluate the best model on the test-set, run:
    (.env) python evaluate_model.py
  3. Now, we can use the model in the demo or for the web app.

Tips

  • The WLASL dataset can be divided into 4 sub-datasets: WLASL100, WLASL300, WLASL1000 e WLASL2000. You can find the various models used for each sub-dataset in the models.py file.
  • The custom frame generator used in the model needs at least 12 frames to work. However, videos 59958, 18223, 15144, 02914 and 55325, in the WLASL1000 and WLASL2000 datasets, are shorter. To solve this problem use the video_extender.py script.

Real-time demo

  1. To start the demo run:
    (.env) python demo.py

Web app

  1. To start the web app run:
    (.env) python serve.py
  2. Go to the following URL: http://127.0.0.1:5000/

Tip

The model used in the demo and the web app was obtained by training the neural net on a custom dataset, called WLASL20custom. This dataset consists of only 20 words: book, chair, clothes, computer, drink, drum, family, football, go, hat, hello, kiss, like, play, school, street, table, university, violin and wall.

Results

I achieved the following accuracy with the proposed models:

  1. WLASL20c: 63% of accuracy.
  2. WLASL100: 34% of accuracy.
  3. WLASL300: 28% of accuracy.
  4. WLASL1000: 19% of accuracy.
  5. WLASL2000: 10% of accuracy.

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgements

About

Recognize the American Sign Language in a video stream and translate it to American word by word.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published