Last Updated on May 9, 2023 by mishou

I. What will you learn?

You can learn how to get English and Japanese texts from a Youtube video in this post. When Closed Caption is enabled, you can use the transcripts to get English texts and Japanese translations. But if Closed Caption is disabled, you have to convert speech to text and translate them with Python.

II. When Closed Caption is enabled

We will create a data frame of English transcripts and Japanese translation like this:

youtube

You can see the scripts here:

https://colab.research.google.com/drive/1WMeRnXlvi6Sv2DHWq6VLYOF_lTZvkt98?usp=sharing

III. When Closed Caption is disabled

If Closed Captioning is disabled, you can cannot get texts in the above way. You have to download the video and extract texts from it. We will use SpeechRecognition and Google speech to text to convert speech to text.

Synchronous recognition requests of Google speech are limited to audio data of 1 minute or less in duration. See Speech-to-Text basics.

So we will split the audio data into chunks, iterate over all chunks and convert each them into text using a function shown in the following post:

Python | Speech recognition on large audio files

1. Procedure

1.Download youtube as mp4 using youtube_dl

2.Convert it to MP3 using ffmpeg

3.Convert it to WAV using ffmpeg

google colaboratory
You can see the scripts here:

2. Speech to text

Please make sure to install googletrans==3.1.0a. It may take more than 10 minutes. You can see all the scripts corrected here:

https://colab.research.google.com/drive/18BzgfrxePV477jxLS5oEfBkbZR0djhMr?usp=sharing

texts retrieved from speech on Google Colaboratory

When you open mytranscript.csv with Excel and you cannot read Japanese texts because of text garbling, please upload the file on Google Drive and open it with Google Sheets.

Not completed yet.

By mishou

Leave a Reply

Your email address will not be published. Required fields are marked *