Easiest way to tackle this web crawling task?

Question:

I currently have been assigned to create a web crawler to automate some reporting tasks I do. This web crawler would have to login with my credentials, search specific things in different fields (some in respect to the the current date), download CSVs that contain the data if there is any data available, parse the CSVs quickly to get a quick number count, create an email with the CSVs attached and send it.

I currently know C++ and Python very well, am in the process of learning C, but I was told that Ruby or Ruby on Rails was a great way to do this. Is Ruby on Rails solely for creating web apps, and if so, does my task fit the description of a web app, or can I just make a standalone program that runs and does it all?

I would like to know which language would be the easiest to code with (has easy to use modules), has a good library/module relative to these tasks. What would I need to take into account before undergoing this task? I have till the end of December to make this, and I only work here for around 12 hours per week (I’m a student, and this is for my internship). Is this feasible?

Thanks.

Asked By: Michael M

||

Answers:

You already know Python, go with that. CSV parsing and mail sending are pretty trivial tasks and I assume you can figure those out with Google.

As for web crawling? Use Mechanize.

Answered By: kqnr

Basically, you can pretty much accomplish this task with any of the languages you listed. If you want learning Ruby as a part of your experience for your internship, then this might be a great project and a way of learning it. But, python would work great, also (you could probably use Mechanize). I should probably disclose that I’m a Python developer and I love it. I think it’s a great language with great support and tools. I’m sure the Ruby guys feel the same about their language. Again, I think it’s what you want to try to accomplish during your internship. What experience do you want to take away, etc. Best of luck.

Answered By: David S

Adding to mechanize:

if your page has a javascript component that mechanize cant handle, selenium drives an actual web browser. If you’re hellbent on using ruby, you can also use WATIR, but selenium has both ruby and python bindings.

Answered By: kreativitea