Last Updated on March 27, 2023 by mishou

I. What I want to do?

I’m reading The Brothers Karamazov by Fyodor Dostoevsky again. And I have decided to delve into In Search of Lost Time by Proust. I may also read Ulysses by James Joyce. They may suffice, but I have searched for books to read for the rest of my life and I am comparing 100 books listed on a few websites. To do this, I have scraped the book lists from each of the sites.

II. Time Magazine’s All-Time 100 Novels


I want to retrieve the authors and titles from the following page:

Time Magazines's all-time 100 novels

I typed:

Show the Python code for scraping titles and authors on a web page linked to

ChatGPT showed the following code:

import requests
from bs4 import BeautifulSoup

url = ""

response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

books = soup.find_all("tr", {"itemtype": ""})

for book in books:
    title = book.find("a", {"class": "bookTitle"}).text.strip()
    author = book.find("a", {"class": "authorName"}).text.strip()
    print("Title: ", title)
    print("Author: ", author)

I have just run the code on Google Colaboratory:

code on google colaboratory

ChatGPT also explained the code:

In this code, we first import the requests and BeautifulSoup libraries. We then specify the URL of the web page we want to scrape and use the requests.get() method to retrieve the HTML content of the page.

Next, we create a BeautifulSoup object from the HTML content using the “html.parser” parser.

We then use the soup.find_all() method to find all the tr elements with an itemtype attribute set to, which correspond to the books listed on the web page.

For each book, we use the book.find() method to find the title and author elements within the book tr element. We use the text.strip() method to remove any extra whitespace from the extracted text.

Finally, we print out the title and author information for each book using the print() function.

III. 100 books to read in Life on


I’m on Garuda Linux and I used Brave Browser for scraping and I opened it by running the following Python code:

from selenium import webdriver
from import Service
from chromedriver_py import binary_path # for the path variable
service_object = Service(binary_path)
from selenium import webdriver
from import Options
import pandas as pd
from import By
from bs4 import BeautifulSoup
# setting up sebdriver
options = Options()
options.binary_location = "/usr/bin/brave"
driver = webdriver.Chrome(chrome_options = options, executable_path=binary_path)
# access the page with the browser

Show all the books in Brave Browser and run the code shown below the image:

100 book titles shown on the screen
# after showing all the 100 books
# retrieve page sources
page_sources = driver.page_source
# parse HTML Code
soup = BeautifulSoup(page_sources, 'html')
# find the span element with class `a-truncate-full`
titles = soup.find_all('span', class_='a-truncate-full')
# clean the texts
titles_ls = [item.text.strip().replace("\n", "") for item in titles]
# count items
# show the list

You can create a data frame for titles and authors by running following code:

# create a data frame
title = titles_ls[0::2]
author = titles_ls[1::2]
df = pd.DataFrame({'title': title,
                  'author': author})
# export a CSV file
df.to_csv('amazon100.csv', index=False)
a data frame on Jupyter Lab

To be continued.

By mishou

Leave a Reply

Your email address will not be published. Required fields are marked *