Last Updated on March 27, 2023 by mishou

I. Translating texts sentence by sentence

I have translated texts in two ways, Google Translate API and YouTubeTranscriptAPI. I have added the latter because the Japanese texts translated line by line (not sentence by sentence) are often difficult to understand.

You can see the code here:

https://colab.research.google.com/drive/1WMeRnXlvi6Sv2DHWq6VLYOF_lTZvkt98?usp=sharing

If the results aren’t shown on Google Colaboratory, you should restart runtime or select “Factory reset runtime“ to start again. 

The CSV file will be saved in the working directory on Google Colaboratory.

Notes

Googletrans is unstable.  You are recommended to install googletrans==3.1.0a0. Even if you have installed that version, you may sometimes encounter errors.

The note on library usage says:

DISCLAIMER: this is an unofficial library using the web API of translate.google.com and also is not associated with Google.
The maximum character limit on a single text is 15k.
Due to limitations of the web version of google translate, this API does not guarantee that the library would work properly at all times (so please use this library if you don’t care about stability).
Important: If you want to use a stable API, I highly recommend you to use Google’s official translate API.
If you get HTTP 5xx error or errors like #6, it’s probably because Google has banned your client IP address.

https://github.com/ssut/py-googletrans

II. Translating texts word by word

You can see my sample scripts though they are not completed.

https://colab.research.google.com/drive/13Cnd0NY5V0pyvEDsczWhBbTyywk7bM5S?usp=sharing

Let’s create a word list from the transcripts. The following image shows how I have created a word list.

By mishou

Leave a Reply

Your email address will not be published. Required fields are marked *