java - Accessing files from other filesystems along with hdfs files in a hadoop mapreduce application -


I know that we can reduce the map with a normal Java application. Now in my case there are also HDFs and files on other filesystems to deal with files to reduce jobs. Is it possible that we can use files from other file systems, while simultaneously using files on HDFS. Is this possible?

So basically my intention is that I have a large file that I want to put in HDFS for parallel computing and then compare the blocks in this file with some other files (which I Do not want to put in HDFS code, they need to be used once as a full-length file.

You can use the files in your In order to distribute them, they can open and read the file in the Configure () method (do not read them in map () because it will be multiple times .

Edit

Reduce the job to access the file from the local file system in your map, you distribute those files You can add cache to when you set up your job configuration.

  JobConnect Job = New JobConf (); Distributed Cache.AdcacheFile (New URI ("/ myapp / lookup.dat # lookup.dat"), Job);   

MapReduce Framework will ensure that those Files will be accessible by your Mappers.

  Configure a public VoID ID (JobConf job) {// Cache archives / files path [local files = distributed cache.GetLocalKeach files (jobs); // open, read and store for use in the map phase}   

and delete files when you're done.

Comments

Popular posts from this blog

Python SQLAlchemy:AttributeError: Neither 'Column' object nor 'Comparator' object has an attribute 'schema' -

java - How not to audit a join table and related entities using Hibernate Envers? -

mongodb - CakePHP paginator ignoring order, but only for certain values -