Skip to content

4yn/wxscribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

wxscribe

Transcribe any audio file and convert to a Microsoft Word document. Simple wrapper around ffmpeg, whisperx (whisper pyannote) and pandoc, initially intended for interview transcription. Assembled by @4yn.

Quickstart colab notebook (make sure to pick the GPU runtime and to shutdown once done)

Usage

./wxscribe.py audio_file_1 audio_file_2 audio_file_3 ...

You will need to accept the pyannote model ToC and generate a huggingface model hub token for the script to work. Provide the token using command line option --hf-token hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx or using the HUGGING_FACE_HUB_TOKEN environment variable.

An audio file my_audio.mp3 will be transcribed to my_audio.docx. If some my_audio.docx already exists it will be skipped; delete the old document if you want to repeat the transcription.

Dependencies

pip install git https://github.com/m-bain/whisperx.git ffmpeg-python pandoc

Also requires pandoc=2.x and ffmpeg installed on system. Alternatively, use this script on Google Colab as it already has these dependencies installed.

Example Usage

$ yt-dlp https://www.youtube.com/watch?v=FAPeFmWmYQo -x -o "audio.%(ext)s"

[youtube] Extracting URL: https://www.youtube.com/watch?v=FAPeFmWmYQo
[youtube] FAPeFmWmYQo: Downloading webpage
[youtube] FAPeFmWmYQo: Downloading ios player API JSON
[youtube] FAPeFmWmYQo: Downloading android player API JSON
[youtube] FAPeFmWmYQo: Downloading m3u8 information
[info] FAPeFmWmYQo: Downloading 1 format(s): 251
[download] Destination: audio.webm
[download] 100% of    8.73MiB in 00:00:00 at 11.48MiB/s
[ExtractAudio] Destination: audio.opus
Deleting original file audio.webm (pass -k to keep)

$ wxscribe.py audio.opus --hf-token hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Converting audio.opus to audio.mp3
Running recognition on audio.mp3
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.7. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file ../../.cache/torch/whisperx-vad-segmentation.bin`
Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0 cu102, yours is 2.0.1. Bad things might happen unless you revert torch to 1.x.
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.7. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file ../../.cache/torch/pyannote/models--pyannote--segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pytorch_model.bin`
Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0 cu102, yours is 2.0.1. Bad things might happen unless you revert torch to 1.x.
Saving to audio.docx

sample transcription

Options

  • --model-size / --batch-size / --language: Options for the whisper model. Adjust to change how much GPU memory is used and what language the transcribed text should be in.
  • --show-timestamps: Print timestamps before each paragraph
  • --seperate-paragraphs: Show speaker names before all paragraphs even if the same speaker continues speaking.
  • --no-speaker: Do not show speaker names

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published