Last Updated on May 9, 2023 by mishou
I. What will you learn?
You can learn how to get English and Japanese texts from a Youtube video in this post. When Closed Caption is enabled, you can use the transcripts to get English texts and Japanese translations. But if Closed Caption is disabled, you have to convert speech to text and translate them with Python.
II. When Closed Caption is enabled
We will create a data frame of English transcripts and Japanese translation like this:

You can see the scripts here:
https://colab.research.google.com/drive/1WMeRnXlvi6Sv2DHWq6VLYOF_lTZvkt98?usp=sharing
III. When Closed Caption is disabled
If Closed Captioning is disabled, you can cannot get texts in the above way. You have to download the video and extract texts from it. We will use SpeechRecognition and Google speech to text to convert speech to text.
Synchronous recognition requests of Google speech are limited to audio data of 1 minute or less in duration. See Speech-to-Text basics.
So we will split the audio data into chunks, iterate over all chunks and convert each them into text using a function shown in the following post:
Python | Speech recognition on large audio files
1. Procedure
1.Download youtube as mp4 using youtube_dl
2.Convert it to MP3 using ffmpeg
3.Convert it to WAV using ffmpeg

2. Speech to text
Please make sure to install googletrans==3.1.0a. It may take more than 10 minutes. You can see all the scripts corrected here:
https://colab.research.google.com/drive/18BzgfrxePV477jxLS5oEfBkbZR0djhMr?usp=sharing

When you open mytranscript.csv with Excel and you cannot read Japanese texts because of text garbling, please upload the file on Google Drive and open it with Google Sheets.
Not completed yet.