NO.1 You have just executed a MapReduce job.
Where is intermediate data written to after being emitted from the Mapper's map method?
A. Into in-memory buffers that spill over to the local file system (outside HDFS) of the TaskTracker
node running the Reducer
B. Into in-memory buffers on the TaskTracker node running the Mapper that spill over and are
written into HDFS.
C. Into in-memory buffers on the TaskTracker node running the Reducer that spill over and are
written into HDFS.
D. Into in-memory buffers that spill over to the local file system of the TaskTracker node running the
E. Intermediate data in streamed across the network from Mapper to the Reduce and is never
written to disk.
The mapper output (intermediate data) is stored on the Local file system (NOT HDFS) of each
individual mapper nodes. This is typically a temporary directory location which can be setup in config
by the hadoop administrator. The intermediate data is cleaned up after the Hadoop Job completes.
Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, Where is the
Mapper Output (intermediate kay-value data) stored ?
NO.2 Which best describes how TextInputFormat processes input files and line breaks?
A. The input file is split exactly at the line breaks, so each RecordReader will read a series of complete
B. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReaders of
both splits containing the broken line.
C. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of
the split that contains the beginning of the broken line.
D. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of
the split that contains the end of the broken line.
E. Input file splits may cross line breaks. A line that crosses file splits is ignored.
HDPCD 解答例 HDPCD 受験
Reference: How Map and Reduce operations are actually carried out
NO.3 You write MapReduce job to process 100 files in HDFS. Your MapReduce algorithm uses
TextInputFormat: the mapper applies a regular expression over input values and emits key-values
pairs with the key consisting of the matching text, and the value containing the filename and byte
offset. Determine the difference between setting the number of reduces to one and settings the
number of reducers to zero.
A. With zero reducers, all instances of matching patterns are gathered together in one file on HDFS.
With one reducer, instances of matching patterns are stored in multiple files on HDFS.
B. There is no difference in output between the two settings.
C. With zero reducers, instances of matching patterns are stored in multiple files on HDFS. With one
reducer, all instances of matching patterns are gathered together in one file on HDFS.
D. With zero reducers, no reducer runs and the job throws an exception. With one reducer, instances
of matching patterns are stored in a single file on HDFS.
HDPCD 再テスト HDPCD オンライン
* It is legal to set the number of reduce-tasks to zero if no reduction is desired.
In this case the outputs of the map-tasks go directly to the FileSystem, into the output path set by
setOutputPath(Path). The framework does not sort the map-outputs before writing them out to the
* Often, you may want to process input data using a map function only. To do this, simply set
mapreduce.job.reduces to zero. The MapReduce framework will not create any reducer tasks.
Rather, the outputs of the mapper tasks will be the final output of the job.
In this phase the reduce(WritableComparable, Iterator, OutputCollector, Reporter) method
is called for each <key, (list of values)> pair in the grouped inputs.
The output of the reduce task is typically written to the FileSystem via
Applications can use the Reporter to report progress, set application-level status messages
and update Counters, or just indicate that they are alive.
The output of the Reducer is not sorted.
NO.4 MapReduce v2 (MRv2/YARN) splits which major functions of the JobTracker into separate
daemons? Select two.
A. Resource management
B. MapReduce metric reporting
C. Managing file system metadata
D. Job coordination between the ResourceManager and NodeManager
E. Heath states checks (heartbeats)
F. Launching tasks
G. Managing tasks
H. Job scheduling/monitoring
The fundamental idea of MRv2 is to split up the two major functionalities of
the JobTracker, resource management and job scheduling/monitoring, into separate
daemons. The idea is to have a global ResourceManager (RM) and per-application
ApplicationMaster (AM). An application is either a single job in the classical sense of Map-
Reduce jobs or a DAG of jobs.
The central goal of YARN is to clearly separate two things that are unfortunately smushed
together in current Hadoop, specifically in (mainly) JobTracker:
/ Monitoring the status of the cluster with respect to which nodes have which resources
available. Under YARN, this will be global.
/ Managing the parallelization execution of any specific job. Under YARN, this will be done separately
for each job.
Reference: Apache Hadoop YARN - Concepts & Applications
JPexamのHortonworksのHDPCD 学習関連題の試験問題と解答はあなたが受験する前にすべての必要とした準備資料を提供しています。HortonworksのHDPCD 学習関連題の認証試験について、あなたは異なるサイトや書籍で色々な問題を見つけることができます。しかし、ロジックが接続されているかどうかはキーです。JPexamの問題と解答は初めに試験を受けるあなたが気楽に成功することを助けるだけではなく、あなたの貴重な時間を節約することもできます。
試験科目：Hortonworks Data Platform Certified Developer
問題と解答：全110問 HDPCD 一発合格