June 2015 Meeting

The Good, The Bad, and The Mechanized


Jun 10 2015




Until we can have JARVIS, we have web pages. Sometimes we want to get information out of them, and their owners don’t want us to. NOT NICE! We need something smarter than ‘curl’, but easier than raw Selenium. ‘Smart’ is no problem, but how about ‘smart, AND EASY’? Enter ‘Mechanize’. With just a few lines of code, you log into secure pages, scroll through controls and form fields, enter data, and submit for the win! Short programs to get info from pages you like as a human, that don’t cooperate with curl.

BONUS Follow-Up: After a helpful person said “BeautifulSoup is neat, but, wow, I could have done that with lxml…” I decided to up-the-game. Let’s revisit for a couple minutes using BeautifulSoup to parse pages with all the </a>, </title>, and several tags missing!

