Meta Tag and WHOIS Scraper
We're looking to do some research on a list of domain names. For each domain name we want to know the following: Domain, Company, Industry, Country, State, City, Zip/Postal, title tag, meta description, meta keywords That's it. The input will be the list of domain names pasted into a text area on a web page. The output should be a downloadable CSV or TAB delimited file I can load into Excel. There should also be visible output on the web page while running so we can see progress. Your script will have list of lists that contain the industry information. formatted as specified below or whatever way is easiest for you (though, we should be able to add/edit/delete from this list as much as we want. Hardcoded within the program is OK). The INDUSTRY "List of lists" could look like this: Industry,tag1,tag2,tag3,... And basically, if the domain name, home page title, meta description or keywords ("the data fields") have either the industry name or any of the tags in them, then that is the industry they should be assigned. Here's an example of what the industry lists might look like, but you can format them any way that works best for you. $legalwords = array("legal","law", "lawyer", "attorney","advoca"); $consultantwords = array ("consultant","consult","advisor"); $medicalwords = array ("medical","medicine","doctor","surgic","stem cell","scienc","research","laborat"); $contractorwords = array ("contractor","construction"); And since it is possible for a company to be in more than one of the industries, I'd like some logic that determines the most appropriate industry, maybe by counting how many matches there are in the data we are looking at for each category. The location information should come from the WHOIS database. We need error checking built into the program so that if a domain no longer exists, or if it redirects elsewhere, the script does not crash, but continues to the next URL. The output should be a TAB delimited file that we can easily load into EXCEL to do some analysis. That's the whole project. Once the project is awarded to you, I will send you a list of sample domains and a more complete list of industries and tags. When you reply, put the word "orange" in the subject line of your PM or BID. If you don't do that, I'll know you didn't read this spec completely, and I won't read your bid or PM. I need this done in the next 12 hours, but that should be easy as it's an extremely small and simple project for someone who knows PHP even reasonably well. And if you're an expert, this is probably an hour or less. Thanks. Mark Keywords: Data, Web, PHP, Page, Tag, Meta, Scraping
| Expired |
More php projects
View AllMore data scraping projects
View AllRelated projects
Search for freelance jobs
"I did not know what to expect at first. But my final impression once I used your site and service is a great one! Simply amazing!
I would recommend this service to any other freelance artists and co workers who are looking to expand their client base."
"The possibility to include all information about my freelance working places in just one website. It means, I don't need to tell my future employer to go to odesk, elance, etc. They can check everything about me in donanza website."




