ruby - getaddrinfo error with Mechanize -


I wrote a script that will go through all the databases in our database, verify that their website URL works , And find a Twitter link on your homepage. We have more than 10,000 URLs to verify, after a fraction of the URL verification, we begin to get getaddrinfo errors for each URL.

Here is a copy of the code, which scraps a URL:

  def scrape_url (url) url_found = false twitter_name = zero start agent = Mechanize.new do | A | A.follow_meta_refresh = true end agent.get (normalize_url (url)) what to do Page | Url_found = true twitter_name = find_twitter_name (page) end @ er & lt; & Lt; "[# {@ Current_record}] Success \ n" Rescue exception = & gt; E @ er and lt; & Lt; "[# {@ On_runcord}] error (# {url}):" @ er and lt; & Lt; E.message @err & lt; & Lt; Note: I have also run a version of this code that creates a single mechanize instance which is shared in all calls to Scrap_Arl. It failed in the exact same way.  

When I run it on EC2, it goes through about 1,000 URLs, then returns the error for the remaining 9,000 ++:

  Getaddrinfo : Temporary failure in name resolution   

Note, I tried to use both Amazon's DNS server and Google's DNS server, thinking that this is a valid DNS problem May be. I got the same results in both cases.

Then, I tried to run it on my local MacBook Pro. This error has been received through 250 only before returning to the rest of the record:

  getaddrinfo: nodename and pronouns, or not known,   

Anyone know how I can get the script through all the records? I got the solution, leaving the MacKaynake connection open and it was up to GC to clean them. After a certain point, there were sufficient open connections which could not establish an additional outbound connection to look for DNS. Here's the code, due to which it works:

  agent = mechanize.new do | A | A.follow_meta_refresh = true a.keep_alive = false end   

By setting keep_alive to false, the connection immediately closes and becomes clear.

Comments

Popular posts from this blog

Python SQLAlchemy:AttributeError: Neither 'Column' object nor 'Comparator' object has an attribute 'schema' -

java - How not to audit a join table and related entities using Hibernate Envers? -

mongodb - CakePHP paginator ignoring order, but only for certain values -