Home
Head's Up: I'm in the middle of upgrading my site. Most things are in place, but there are something missing and/or broken including image alt text. Please bear with me while I'm getting things fixed.

Automatically Determine The Number Of Columns In An HTML Table With Python's Beautiful Soup

This is what I use to find the number of columns in an HTML table.

Details

I'm working on an ascii art tool. It includes a full set of unicode characters to choose from. I pulled the characters from the W3C site. There are 28 pages with tables that sort everything from the top down then column by column. Something like this :

code full

They're set up that way based of their unicode ID numbers. Makes sense on the W3C pages, but I want them sorted continuously from left to right.

code full

I'm parsing the source HTML in Beautiful Soup then doing the formatting conversion in Pandas. I want to know the number of columns in the tables to setup the Pandas data frame explicitly. So, I setup the code snippet to figure that out. It loops through every row of the table and counts the number of [TODO: Code shorthand span ] (header) and [TODO: Code shorthand span ] (data) cells on each row then runs them through max functions to come up with the longest row.

- This wasn't really necessary in this case. All the tables had 18 columns in them but I wanted code that confirmed that

- I think you can just throw stuff at pandas and have it figure this stuff out, but I'm not familiar enough with it yet to know that for sure. Especially because the tables had inconsistent numbers of header rows and columns.

- There's some more automation that could be done to import data to pandas. For example, remove any table row the consists of only able headers.