@InterfaceAudience.Public @InterfaceStability.Stable public interface Mapper<K1,V1,K2,V2> extends JobConfigurable, org.apache.hadoop.io.Closeable
Maps are the individual tasks which transform input records into a intermediate records. The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs.
The Hadoop Map-Reduce framework spawns one map task for each 
 InputSplit generated by the InputFormat for the job.
 Mapper implementations can access the JobConf for the 
 job via the JobConfigurable.configure(JobConf) and initialize
 themselves. Similarly they can use the Closeable.close() method for
 de-initialization.
The framework then calls 
 map(Object, Object, OutputCollector, Reporter) 
 for each key/value pair in the InputSplit for that task.
All intermediate values associated with a given output key are 
 subsequently grouped by the framework, and passed to a Reducer to  
 determine the final output. Users can control the grouping by specifying
 a Comparator via 
 JobConf.setOutputKeyComparatorClass(Class).
The grouped Mapper outputs are partitioned per 
 Reducer. Users can control which keys (and hence records) go to 
 which Reducer by implementing a custom Partitioner.
 
 
Users can optionally specify a combiner, via 
 JobConf.setCombinerClass(Class), to perform local aggregation of the 
 intermediate outputs, which helps to cut down the amount of data transferred 
 from the Mapper to the Reducer.
 
 
The intermediate, grouped outputs are always stored in 
 SequenceFiles. Applications can specify if and how the intermediate
 outputs are to be compressed and which CompressionCodecs are to be
 used via the JobConf.
If the job has 
 zero
 reduces then the output of the Mapper is directly written
 to the FileSystem without grouping by keys.
Example:
     public class MyMapper<K extends WritableComparable, V extends Writable> 
     extends MapReduceBase implements Mapper<K, V, K, V> {
     
       static enum MyCounters { NUM_RECORDS }
       
       private String mapTaskId;
       private String inputFile;
       private int noRecords = 0;
       
       public void configure(JobConf job) {
         mapTaskId = job.get(JobContext.TASK_ATTEMPT_ID);
         inputFile = job.get(JobContext.MAP_INPUT_FILE);
       }
       
       public void map(K key, V val,
                       OutputCollector<K, V> output, Reporter reporter)
       throws IOException {
         // Process the <key, value> pair (assume this takes a while)
         // ...
         // ...
         
         // Let the framework know that we are alive, and kicking!
         // reporter.progress();
         
         // Process some more
         // ...
         // ...
         
         // Increment the no. of <key, value> pairs processed
         ++noRecords;
         // Increment counters
         reporter.incrCounter(NUM_RECORDS, 1);
        
         // Every 100 records update application-level status
         if ((noRecords%100) == 0) {
           reporter.setStatus(mapTaskId + " processed " + noRecords + 
                              " from input-file: " + inputFile); 
         }
         
         // Output the result
         output.collect(key, val);
       }
     }
 Applications may write a custom MapRunnable to exert greater
 control on map processing e.g. multi-threaded Mappers etc.
JobConf, 
InputFormat, 
Partitioner, 
Reducer, 
MapReduceBase, 
MapRunnable, 
SequenceFile| Modifier and Type | Method and Description | 
|---|---|
| void | map(K1 key,
   V1 value,
   OutputCollector<K2,V2> output,
   Reporter reporter)Maps a single input key/value pair into an intermediate key/value pair. | 
configurevoid map(K1 key, V1 value, OutputCollector<K2,V2> output, Reporter reporter) throws IOException
Output pairs need not be of the same types as input pairs.  A given 
 input pair may map to zero or many output pairs.  Output pairs are 
 collected with calls to 
 OutputCollector.collect(Object,Object).
Applications can use the Reporter provided to report progress 
 or just indicate that they are alive. In scenarios where the application 
 takes significant amount of time to process individual key/value
 pairs, this is crucial since the framework might assume that the task has 
 timed-out and kill that task. The other way of avoiding this is to set 
 
 mapreduce.task.timeout to a high-enough value (or even zero for no 
 time-outs).
key - the input key.value - the input value.output - collects mapped keys and values.reporter - facility to report progress.IOExceptionCopyright © 2022 Apache Software Foundation. All rights reserved.