Python Web Parser
Job Description Hi there, We have been maintaining an updated listing of thousands of domain names and we simply need to scale up. The current system parses remote websites for simple things such as outbound links, title tags, and meta descriptions along with some underlying DNS information. Currently we are using asynchronous curl in php with a queue-like system but this will not suffice on a larger scale where we likely need a language that is more able to multi-thread our requests for quicker throughput. If you choose to accept this project we can forward our current parser (PHP) so you can get a better feel for what we need parsed. Essentially we need the following information given just the domain name (e.g: google.com): Server IP address Domain Nameservers From the parsed HTML we would need the following and already have the regex for: Title tag Meta Keywords/Description Outbound links While we will include the working PHP we encourage you to be creative in reworking the code in speeding up the parsing process. Feel free to ask any additional questions. Best, Bryan Keywords: Web Programming, expression, python, multithreading, regular, php, curl, data, scraping, web, dn
| Expired |
More python projects
View AllMore dn projects
View AllRelated projects
Search for freelance jobs
"I did not know what to expect at first. But my final impression once I used your site and service is a great one! Simply amazing!
I would recommend this service to any other freelance artists and co workers who are looking to expand their client base."
"The possibility to include all information about my freelance working places in just one website. It means, I don't need to tell my future employer to go to odesk, elance, etc. They can check everything about me in donanza website."




