Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash due to unicode decode error while getting video title from vtt #138

Closed
d89u opened this issue Apr 7, 2024 · 3 comments
Closed

Crash due to unicode decode error while getting video title from vtt #138

d89u opened this issue Apr 7, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@d89u
Copy link

d89u commented Apr 7, 2024

Might be a malformed character in a video title. Would it be possible to let yt-fts skip undefined characters and throw a warning instead of a crash?

Adding subtitles to database... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--
Traceback (most recent call last):
File "c:\users\derja\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\derja\appdata\local\programs\python\python39\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\derja\AppData\Local\Programs\Python\Python39\Scripts\yt-fts.exe_main
.py", line 7, in
File "c:\users\derja\appdata\local\programs\python\python39\lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
File "c:\users\derja\appdata\local\programs\python\python39\lib\site-packages\click\core.py", line 1078, in main
rv = self.invoke(ctx)
File "c:\users\derja\appdata\local\programs\python\python39\lib\site-packages\click\core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\derja\appdata\local\programs\python\python39\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\derja\appdata\local\programs\python\python39\lib\site-packages\click\core.py", line 783, in invoke
return _callback(*args, **kwargs)
File "c:\users\derja\appdata\local\programs\python\python39\lib\site-packages\yt_fts\yt_fts.py", line 63, in download
foo = download_channel(channel_id, channel_name, language, number_of_jobs, s)
File "c:\users\derja\appdata\local\programs\python\python39\lib\site-packages\yt_fts\download.py", line 264, in download_channel
vtt_to_db(channel_id, tmp_dir, s)
File "c:\users\derja\appdata\local\programs\python\python39\lib\site-packages\yt_fts\download.py", line 174, in vtt_to_db
vid_title = get_vid_title(os.path.join(os.path.dirname(vtt), f'{vid_id}.info.json'))
File "c:\users\derja\appdata\local\programs\python\python39\lib\site-packages\yt_fts\download.py", line 196, in get_vid_title
return json.load(f)['title']
File "c:\users\derja\appdata\local\programs\python\python39\lib\json_init
.py", line 293, in load
return loads(fp.read(),
File "c:\users\derja\appdata\local\programs\python\python39\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 273833: character maps to

@d89u d89u changed the title Crash due to unicode decode error while getting video title Crash due to unicode decode error while getting video title from vtt Apr 7, 2024
@NotJoeMartinez
Copy link
Owner

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 seems to be related to a Windows-1252 encoding error.
The get_vid_title() function could be wrapped in a try/except but we'd still run into problems with not returning a name from the function.

gpt & stack overflow recommend setting a default encoding of utf-8

with open(info_file, 'r', encoding='utf-8', errors='ignore') as f:
    return json.load(f)['title']

Would you mind providing a channel url to reproduce this on?
The yt-fts &yt-dlp version numbers would be helpful as well.
Thanks.

@AtaGunZ
Copy link

AtaGunZ commented Apr 10, 2024

I have downloaded via pip today, and had the same issue. Made your suggested change:

def get_vid_title(info_json_path):
    """
    Retrieves video title from the info json file.
    """
    with open(info_json_path, 'r', encoding='utf-8', errors='ignore') as f:
        return json.load(f)['title']

and after that it worked.

@NotJoeMartinez NotJoeMartinez added the bug Something isn't working label Apr 10, 2024
NotJoeMartinez added a commit that referenced this issue Apr 10, 2024
@NotJoeMartinez
Copy link
Owner

Fixed in v0.1.44. Bug was introduced on 896f8fd by writing json to file system which will be encoded differently on Windows. Should be solved/prevented by specifying the utf-8 encoding when using json.load, I don't have a Windows environment to test this in so feel free to make another issue if it persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants