Comparable Sales — and Widgets, APIs and Screen Scraping
Recently someone contacted me and inquired about pulling comparable sales info from Zillow, Trulia, Eppraisal, Yahoo and Cyberhomes. He also wanted to sort the details before storing it in excel for further analysis. If it works well, he wanted to package it and sell to the realtor community.
I told him that storing comparable sales data (or any other data for that matter) is against their Terms and conditions. Most of these providers (except Zillow and Eppraisal) do not provide APIs to get comparable sales. However, regardless, he found someone to do screen scraping these sites and create an excel spread sheet for a cheap price.
Screen scraping and stealing information from web sites has serious ramifications. This will create legal issues once the web site owners trace the activity to his IP address. His site also could be blacklisted. Even worse, screen scraping will stop functioning once the site makes minor changes to their HTML which is not uncommon in today’s world dominated by screen scrapers.
Screen scraping is simple parsing of web pages using a programming language (like PHP, Cold Fusion, Java, ASP, Perl, Python) looking for specific patterns in the HTML code extracting certain key details. This only requires basic programming skills and most of the languages make it easy with powerful parsing capabilities. This amounts to piracy and can have legal ramifications and is best avoided. The temptation is high given there are many freelancers over the internet offering cheap solutions using screen scraping.
This leads to the basic question – How can one access comparable sales information to attract traffic to his site?
The answer depends on your needs and capability. If you don’t want to get your hands dirty with the programming and/or you have low budget, your best bet will be using readily available widgets provided by most of these sites. You will only need some basic HTML skills to make sure the widget is placed properly on your web site without distorting the layout. There may also be plug-ins available (like the WordPress local market explorer) if you want to add these to your blog.
For advanced users with programming skills, you can try the API (Application programming Interface) offered by these providers. You can also hire programmers to do this for you. APIs are mostly Web services based on REST (Vs SOAP). Amazon was the pioneer in this area later embraced by most major players. There are very good frameworks or libraries available for using these APIs. This gives you maximum flexibility and you can combine this with other APIs like Google maps, Facebook, Twitter, Walkscore and Yelp (to name a few) to create very interesting end results, known as mashups. Word of caution – make sure that you read and follow their Terms & Conditions when doing this.
API offerings may be very limited in many cases and you may end up using the widgets in these situations. One also has to be aware of the API changes which may break your code requiring fixes to keep it running; something Google, Amazon, Twitter and Facebook have done frequently. I’d recommend making sure you have an ongoing relationship with the programmer when you hire for this kind of job. Don’t go only by the cost since most of them may not be around when your code needs fix.
Questions about APIs, widgets, or screen scraping? Ask away in the comments!
Geordie Romer
Posted at 17:07h, 30 NovemberYou couldn’t pay me to use sold / comparable data from those sites. Who in the real estate industry would find that data helpful? To me, data from county records merged with MLS data would be more interesting, which certainly isn’t what any of those vendors is providing in my area. Wouldn’t an agent or brokerage be better off fighting for their MLS to get on the RPR list?
Murali Vasudevan
Posted at 07:33h, 01 DecemberGeordie – You may be making a good point. However, if you are not a realtor and have no access to the MLS data, these providers may still be your best option – especially if you don’t want to pay for it. It is free and is easy to access. The accuracy of the data has always been debatable and is not going to stop folks out there from using it. I still get inquiries quite frequently for the data from these providers.
mrspeeb
Posted at 14:29h, 03 DecemberIsn’t screen scraping (crawling) what Google, Yahoo, Bing, etc. do for search engine data?
Murali Vasudevan
Posted at 07:15h, 04 Decembermrspeeb,
Good question. It is the purpose of scraping or scrawling and what you do with the data that makes the difference.
Technically, search engines (robots) are reading all the accessible pages of your web site and storing (indexing) key information like title, key words etc. in their database for future search by the public.
In fact this is what every one wants – to be picked up by the search engines, being visible to the public – kind of free advertising.
On the other hand, screen scraping is used for getting a specific information (mostly proprietary) from a content provider and using it without giving credit to the actual owner – kind of stealing.
Almost like getting a free ride at some one’s expense.