Output format #187

famda · 2024-05-16T10:42:29Z

Hey!
Awesome work on this!

Is it possible to transcript/diarize and get a json output as a result file?
That would be a nice feature to have.

MahmoudAshraf97 · 2024-05-16T10:54:29Z

Thanks, it's possible yes, there's an example in one of the branches if you want to try it, but I haven't added it to the main branch because when it comes to JSON, everyone has their own scheme and a universal scheme won't cut it, but happy to hear your suggestions

famda · 2024-05-16T11:35:33Z

I understand. I think is just a matter of having structure on the response. Something that can be deserialized.
I was also testing this which is kinda wrapper api around whisper.
That API gives you the possibility of getting the type of format you want to receive (text, json, ...).

with the possibility of passing an argument like --output_format [json, srt, text, or whatever]

My idea was to have something like this (just a suggestion if it makes sense):

{
    "text": "Hi, my name is Test.",
    "speaker": "Speaker 0",
    "segments": [
        {
            "id": 0,
            "seek": 0,
            "start": 0.0,
            "end": 5.4,
            "text": "Hi, my name is Test.",
            "tokens": [ 
                  double array
            ],
            "temperature": 0.0,
            "avg_logprob": -0.19734466075897217,
            "compression_ratio": 1.7903780068728523,
            "no_speech_prob": 0.1006949171423912,
            "words": [
                {
                    "word": " Hi,",
                    "start": 0.0,
                    "end": 0.64,
                    "probability": 0.7109836935997009
                },
                {
                    "word": " my",
                    "start": 0.88,
                    "end": 1.08,
                    "probability": 0.9681467413902283
                },
                {
                    "word": " name",
                    "start": 1.08,
                    "end": 1.22,
                    "probability": 0.9989060163497925
                },
                {
                    "word": " is",
                    "start": 1.22,
                    "end": 1.38,
                    "probability": 0.9960727691650391
                },
                {
                    "word": " Test.",
                    "start": 1.38,
                    "end": 1.62,
                    "probability": 0.8055099844932556
                }
            ]
        }
    ],
    "language": "en"
}

What do you think of this?

MahmoudAshraf97 · 2024-05-16T11:57:24Z

Sounds reasonable, I'll work on it when I have the time, or maype open a PR if possible 😁

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output format #187

Output format #187

famda commented May 16, 2024

MahmoudAshraf97 commented May 16, 2024

famda commented May 16, 2024

MahmoudAshraf97 commented May 16, 2024

Output format #187

Output format #187

Comments

famda commented May 16, 2024

MahmoudAshraf97 commented May 16, 2024

famda commented May 16, 2024

MahmoudAshraf97 commented May 16, 2024