hadoop - Difference between Nutch crawl giving depth='N' and crawling in loop N times with depth='1' -
Background of my problem: I'm running Nutch1.4 at Hutchop0.20.203. I'm working on the niche segment to get the last output. But waiting for the entire crawl before running the matrade, the solution causes a long lasting run. Now I am triggering the mapping as soon as they are released, as soon as the jobs are removed from the areas. I give a depth = 1 to crawl a loop ('n = at the time of' depth). I am losing some URL while I crawl in the loop in depth 1, giving depth while making the N times equal.
Please get below the pseudo code:
Case 1 : Noah Crawl = 3 // on Hobachup Create a list object to store, which we are going to pass the Enchatch list nutchArgsList = new ArrayList (); nutchArgsList.add ("-depth"); nutchArgsList.add (integer.stosting (3)); & lt; ... other nutch args ... & gt; Case 2 : In Crawling In Loop 3, Crawling In Loop 3 (multiple times with depth) = 1 (int depth = Roe; =; Depth; Stretch; ;;; Depth; Run ++) { // Create list items to store arguments that we are going to pass for nuts list nutchArgsList = new ArrayList (); nutchArgsList.add ("- Depth"); nutchArgsList.add (Integer.toString (1)); // Note I have provided depth as 1 & lt; ... other nutch args ... & gt; ToolRunner.run (nutchConf, new crawl), nutchArgsList.toArray (new string [nutchArgsList.size ()]); } Getting url (DB footfade) When I crawling as a depth in the loop several times in the loop I have tried it on standalone nach, where I walk 3 times vs 3 times the same with a 3x depth Runs on URL 1. I compare the difference between crawlbund and url only 12. But when I get 1000 urls from hoodop using the Touloner As far as DB_APFETED. As far as I have understood, Nichtha triggers crawl in a loop as a depth value. Please suggest it. Apart from this, please let me know that when I am using the tooler on the headpiece in the same way on different looks, then why is the difference? Thanks in Adendas. I have found that the behavior of nach alone (hard disk) Integrated directly with) running change and Hdop cluster. Generator score filtering appears to be much higher with a Hdop cluster, so "-topan" setting should be high enough. I recommend running your crawl with a high (at least 1000) "TopN" and not This is similar to my reaction. After doing this, I came to know that my nails stand alone and HDFS started to start better. / P>
Comments
Post a Comment