Last Updated on July 6, 2022 by shibatau
I. Scrape a table from a PDF file
Here is a code of my sample codes on Google Colaboratory:
tabula.read_pdf("/content/WEF_GGGR_2021.pdf", pages=10, stream=True, lattice=False)
- if your tables have lines separating cells, you can use lattice option. By default, tabula-py sets guess=True. If your tables don’t have separation lines, you can try stream option.
- read_pdf( ) reads only page 1 by default.
You can learn more here:
You can use Google Colaboratory to run the scripts. Please download the PDF file linked in II. Data and upload it to Google Colaboratory. You can see the scripts here: