PRISM: ALLOCATION OF RESOURCES IN PHASE-LEVEL

Transcription

PRISM: ALLOCATION OF RESOURCES IN PHASE-LEVEL
International Journal of Research In Science & Engineering
Volume: 1 Special Issue: 2
e-ISSN: 2394-8299
p-ISSN: 2394-8280
PRISM: ALLOCATION OF RESOURCES IN PHASE-LEVEL USING
MAP-REDUCE IN HADOOP
Ms.Savitri.D.H 1 , Narayana H.M 2
1
PG Student, Department of CSE, M.S.Engineering College, savitrih24@gmail.com
2
Associate Professor, Department of CSE, M.S.Engineering College, narayana.hm@gmail.com
ABSTRACT
MapReduce is programming tool for Hadoop cluster. While allocating resources, MapReduce has
two levels: Task-level and Phase-level. These levels should be used to check performance of each job. There
is a limitation with allocating resources at Task-level. So it affects data-locality of a particular job. We
present algorithm called PRISM: which presents at the Phase-level. It is called as Phase-level scheduling. In
the Phase-level, when we want to schedule a job for the given various resource requirements. So here we find
that, PRISM achieves data locality in variety of clusters. This scheduling algorithm may improves execution
of one server that is connected to many node it is also called as parallelism, and also improves resource
consumption with respect to time. This algorithm is only applicable in the running time of hadoop
schedulers. Running time of job is 1.3 time faster than current hadoop scheduler.
Keywords: CloudComputing, MapReduce, Hadoop, Scheduling, resource allocation
------------------------------------------------------------------------------------------------------------------------1. INTRODUCTION
Now a day’s business and computer application are reliant on internet services with many users. The
large volume of data that is worked in internet services are shift towards data -driven. Examples are Yahoo,
Facebook, Rackspace. Cluster computing systems like MapReduce were generally o ptimized for batch jobs. The
internet service uses MapReduce to process a large data of size peta bytes of data in a day -to-day life. Normally
job scheduler is the process of computer application for controlling unattended background program execution.
Synonyms are batch system, distributed resources management system and distributed resource manager.
MapReduce often work with same data set and run side-by-side on the same physical hardware. We
call such clusters frameworks as “multi-tenant” clusters[8]. It is important to control the amount of resources
assigned to each computer framework. Otherwise, MapReduce suffer from conflicting resources demands,
leading to poor performance. Sometime, while scheduling tasks if we have less running task on a single mac hine
will also cause poor resource utilization. In a MapReduce technique, if a map task have homogenous resources
then job scheduling problem is easy to solve. Then it is also easier to solve for reducer. Suppose the job has run time resource requirements then it varies from task-to-task. It leads to lower performance.
The task have many phases with different procedures and it can be characterized by homogenous resources[5].
Suppose, phases in the task have heterogeneous resources then job scheduling base d on resource conflict or low
utilization.
To overcome this, we present algorithm called “PRISM”. In this paper, we perform resource allocation
at the level of task phases. While scheduling a job, we find a many of variation of resources and run -time
resources. Because of these resources, resource conflict will occur. Phase-level in MapReduce will face and
process these type of resources to achieve higher degree of parallelism and performance. We develop algorithm
at the phase level is called as “PRISM”.
2. LITERATURE SURVEY
It shows existing system and how to overcome the existing system explanation of proposed system and it’s
components.
2.1 Existing System:
The original MapReduce work is to schedule the task in different levels. In a MapReduce technique, it is a
collection of jobs and can be scheduled concurrently on multiple machines, resulting in reduction in job running
time.
Many companies such as Google, Facebook, and Yahoo, they refer MapReduce to process large volume of
data. But they refer at task level to perform these data. At the task level, performance and effectiveness become
IJRISE| www.ijrise.org|editor@ijrise.org[199-202]
International Journal of Research In Science & Engineering
Volume: 1 Special Issue: 2
e-ISSN: 2394-8299
p-ISSN: 2394-8280
critical to day-to-day life. Initially the task level performs two phases one is maper phase and other is reducer
phase. In mapper phase, it takes data blocks hadoop distributed file system and it maps, merge the data and
stored in the multiple files. Then the second, reducer phase will fetch data from mapper output and shuffle, sort
the data in a serialized manner.
2.1.1 Disadvantages of Existing System


Varying resources at the task-level offer lower performance.
It is difficult for task-level scheduler to utilize the run-time resources. So that it reduces job execution
time while executing.
2.2 Proposed System:
The main contribution of this paper is to, demonstrate the imp ortance of phase-level. In a phase-level, we
perform a task or process with heterogeneous resource requirements. We have phase -level scheduling algorithm
which improves execution parallelism and performance of task. The phase -level which has these parameters
with good working characters. So we present PRISM, i.e Phase and Resource Information -aware Scheduler for
MapReduce at the phase-level. While proceeding a task, it has many run-time resources within it’s lifetime.
While scheduling the job, PRISM offers higher degree of parallelism than current hadoop cluster. It refers at the
phase-level to improve resource utilization and performance.
3. SYSTEM ARCHITECTURE
We present a PRISM, such that it allocates a fine-grained resources at the phase-level to perform job
scheduling.PRISM mainly consists of 3 components: first one is the phase based scheduler at master node, local
node manager at phase transaction with scheduler and job progress monitor to capture phase -level information.
Note
Logical
Deployment
Diagram
phase
based
scheduler
run
Algorithm
Job
scheduling
YARN
PRISM
reportLog
HDFS
Selection
of phase
Fig: system Architecture of Phase-Level
To achieve these three phases, will perform a phase-level scheduling mechanism. When the task needs
to scheduled from node manager, scheduler replies with task scheduling request. Then node manager launches a
task. After completion of it’s execution of phase, then again next task will launches. While proceeding these
phases, it will pause for some time to remove the resource conflict.
While proceeding in a phase level, phase-based scheduler send message to node manager. Upon
receiving heartbeat message from node manager reporting resource availability on node, the scheduler must
select which phase should be scheduled on node. For each job J consists of two types of tasks: map task M and
reduce task R. Let (t)  {M, R} denote the type of tas k t. We define the Utility function with machine n and
assigning phase I as shown in equation.
In utilization, PRISM is able to achieve shorter results and is able to achieve shorter job running time while
maintaining high resource utilization for large workloads containing a mixture of jobs, which are same cluster.
IJRISE| www.ijrise.org|editor@ijrise.org[199-202]
International Journal of Research In Science & Engineering
Volume: 1 Special Issue: 2
e-ISSN: 2394-8299
p-ISSN: 2394-8280
Fig: Utilization Using PRISM
U(i,n) = Ufairness (I,n)+.Upref (i.n)
Where Ufairness and u pref are the utilities of improving job performance.  is the adjustable weight factor.
If  is zero, there is improvement in performance.
Phase
Based
Scheduler
Node
Manager
Task
1
2
3
4
5
Pause
Duration
6
7
8
Process Execution
Messages
Fig: Phase-Level Scheduling Mechanism
Yarn is also one of mechanism of MapReduce.Here initial procedure of MapReduce will takes place in
hadoop. They will map and shuffle the data. Then merge the data into single serialized manner.
Fig: Yarn in MapReduce
4. MODULES IN PHASE LEVEL:
MapReduce is framework for processing parallel problems across huge datasets using nodes, and is
referred as a cluster or grid. Processing can occur through data either as uns tructured or structured manner.
Usually MapReduce levels takes place in three phase: 1. Map step 2.Shuffle step 3. Reduce step. While
implementing levels, sometimes they proceed through master and slave nodes at Hadoop MapReduce cluster.
We posit three levels of MapReduce to proceed the tasks as:
IJRISE| www.ijrise.org|editor@ijrise.org[199-202]
International Journal of Research In Science & Engineering
Volume: 1 Special Issue: 2
e-ISSN: 2394-8299
p-ISSN: 2394-8280
1. Hadoop MapReduce
2. Prism
3. Design Rationale
4.1 Hadoop MapReduce
It is simple slot-based allocation scheme. It will not take any run time resources while implementing
task. Initially it has one hadoop cluster, consisting of one large machine as master node and it is connected to
many slave nodes. The responsibility of master node is to scheduling job to all slave nodes. In this module
simple mapper and reducer functions will be handle by the tasks. Here ha doop distributed file system provide
data blocks to all map and reduce tasks.
4.2 Prism
While allocating the resources, sometimes resources may be idle or resources are run -time resource. If
they are idle, resource allocation must be wasted. So run -time resources stimulate to develop fine-grained
resources at the phase-level to achieve different volumes of data in single machine such that it improve resource
utilization compared to the other tasks.
The key issue is that when one task has completed in phase-level, subsequent phase of task is not scheduled
immediately. It will “pause” for some time to remove resource conflict then proceed next phase.
4.3 Design rationale
The responsibility of MapRduce is to assigning task with consideration of efficiency an d fairness. It
must maintain high resource utilization in cluster and job running time implies job execution.
5. CONCLUSION
MapReduce is programming model for cluster to perform a data-intensive computing. In this paper we
mainly demonstrate that, if the resources focus on task-level, execution of each task may divided into many
phases. While executing these phases, many breaking- down of map and reduce tasks will takes place and
execute them in a parallel across a large number of machine, so that it will reduce running time of data-intensive
jobs.So they will perform resource allocation at the phase-level.
We will introduce PRISM at the phase-level. PRISM demonstrate that, how run-time resources can be
used and how it varies over the long life time. PRISM improves job execution algorithm and performance of
resources without introducing stragglers.
REFERENCES
[1]Hadoop MapReduce distribution. http://hadoop.apache.org.
[2] Hadoop Capacity Scheduler, http://hadoop.apache.org/docs/
stable/capacity scheduler.html/.
[3] Hadoop Fair Scheduler. http://hadoop.apache.org/docs/
r0.20.2/fair scheduler.html.
[4] Hadoop Distributed File System,
hadoop.apache.org/docs/hdfs/current/
[5] GridMix benchmark for Hadoop clusters.
http://hadoop.apache.org/docs/mapreduce/current/gridmix.html .
IJRISE| www.ijrise.org|editor@ijrise.org[199-202]