ruby - getaddrinfo error with Mechanize -
I wrote a script that will go through all the databases in our database, verify that their website URL works , And find a Twitter link on your homepage. We have more than 10,000 URLs to verify, after a fraction of the URL verification, we begin to get getaddrinfo errors for each URL.
Here is a copy of the code, which scraps a URL:
def scrape_url (url) url_found = false twitter_name = zero start agent = Mechanize.new do | A | A.follow_meta_refresh = true end agent.get (normalize_url (url)) what to do Page | Url_found = true twitter_name = find_twitter_name (page) end @ er & lt; & Lt; "[# {@ Current_record}] Success \ n" Rescue exception = & gt; E @ er and lt; & Lt; "[# {@ On_runcord}] error (# {url}):" @ er and lt; & Lt; E.message @err & lt; & Lt; Note: I have also run a version of this code that creates a single mechanize instance which is shared in all calls to Scrap_Arl. It failed in the exact same way. When I run it on EC2, it goes through about 1,000 URLs, then returns the error for the remaining 9,000 ++:
Getaddrinfo : Temporary failure in name resolution Note, I tried to use both Amazon's DNS server and Google's DNS server, thinking that this is a valid DNS problem May be. I got the same results in both cases.
Then, I tried to run it on my local MacBook Pro. This error has been received through 250 only before returning to the rest of the record:
getaddrinfo: nodename and pronouns, or not known, Anyone know how I can get the script through all the records? I got the solution, leaving the MacKaynake connection open and it was up to GC to clean them. After a certain point, there were sufficient open connections which could not establish an additional outbound connection to look for DNS. Here's the code, due to which it works:
agent = mechanize.new do | A | A.follow_meta_refresh = true a.keep_alive = false end By setting keep_alive to false, the connection immediately closes and becomes clear.
Comments
Post a Comment