June 2015 Meeting

The Good, The Bad, and The Mechanized

Wednesday

Jun 10 2015

11:30am

@Prototek

About

Until we can have JARVIS, we have web pages. Sometimes we want to get information out of them, and their owners don’t want us to. NOT NICE! We need something smarter than ‘curl’, but easier than raw Selenium. ‘Smart’ is no problem, but how about ‘smart, AND EASY’? Enter ‘Mechanize’. With just a few lines of code, you log into secure pages, scroll through controls and form fields, enter data, and submit for the win! Short programs to get info from pages you like as a human, that don’t cooperate with curl.

BONUS Follow-Up: After a helpful person said “BeautifulSoup is neat, but, wow, I could have done that with lxml…” I decided to up-the-game. Let’s revisit for a couple minutes using BeautifulSoup to parse pages with all the </a>, </title>, and several tags missing!

We’ll be glad to see you there, snacks provided by Techlahoma!


May 2015 Meeting

Is It BeautifulSoup Yet?

Wednesday

May 13 2015

11:30am

@Prototek

About

Need to get some info out of a crummy web site? Parse some wonky XML? Any -ML, whether its well-formed or not?

Let me tell you about “BeautifulSoup”, the 4x4 of markup processing libraries. Its not only capable, but also easy.

Process out a chunk of a big web page with a dozen lines of python? How about messed up HTML with missing closing tags or tags closing out of order? No problem. What about a goofy markup dialect (DWML) that makes SAXParser crash? Its no problem.

BeautifulSoup is sturdy, easy (enough) and fast, and it’s easy to include in your program. Come see some simple, iPython Notebook-equipped examples that you can checkout and run from github, and hey, thanks to Techlahoma, have some tasty pizza to keep you busy all the while.