Scrapping the web for information can be — you guessed it — as difficult as finding a needle in a Haystack.
So, with that in mind, a group of hackers and journalists have created a program with a similar sounding name to solve the problem.
Haystax is an open source tool that works as a bookmarklet that users can drag to the top of their browsers and then employ after a click.
What is Haystax ?
We want Haystax to be flexible enough to tackle a bunch of different databases eventually, so it’s versatile enough to work in a variety of use cases. It’s not very flexible right now, since we’re still building it out.
So we developed this at Newshack day in San Francisco last weekend.
And basically we just got together for a weekend and pitched the idea and built it out in the span of about 30 hours or so.
It got a pretty big response from the people who were there. And, yeah, we were able to create a working prototype in the span of a weekend.
Who was involved?
We had a team of about eight people. A bunch of different journalists and one developer, mostly from the San Francisco area.
How does it work?
And on any browser you just turn it on, and they built this tool for exploring the structure of web pages, so our developer, when he was thinking through this problem, he thought that this might be a really cool way to tackle it.
So, with x-ray goggles you click on this thing and you can hover over elements of a webpage. It shows: This is a table; This is a Div; and it helps people learn how websites are coded.
Our developer’s thinking was with a couple of extra lines of code it would allow a user to define what was there and then set it up to scrape based on what the structure of the pages was.
That’s a really good question.
We put this thing together in about 30 hours and it has been about a week, and we haven’t done any significant work on it.
I would like to say it’s up and ready now, and people can use it. They can look at the sourcecode at GitHub.
But it doesn’t seem to work on lots of different types of things, so it really isn’t prime time ready.
If we can get a lot of buy in from the open source community, however, and other folks that want to use a tool like this, than I think it’s a simple enough problem that we could have a real tool in a few weeks, or a few months.
It really depends on how quickly we can mobilize people and get them interested in this sort of problem.