Improving Batch Application Service Through Tuning and Parallelism

Transcription

Improving Batch Application Service Through Tuning and Parallelism
Improving Batch
Application Service
Through Tuning and
Parallelism
Dan Squillace
Mainframe Support Manager
SAS Institute
Cary, NC USA
Dan.Squillace@sas.com
Copyright © 2005, SAS Institute Inc. All rights reserved.
Some Business Drivers for Performance
Improvement Acidities
 Increasing data volumes
• More customers
• More data about each customer needed for increasingly
sophisticated analytics which aid better and more timely
decision-making.
 Decreasing processing window
• Improve BI application availability by shortening ETL
elapsed time.
 Increasing pressure to reduce costs
• Lower resource requirements
• Improve competetive position
Copyright © 2005, SAS Institute Inc. All rights reserved.
Session Overview
 This session focuses on processing
improvements beneficial to handling large data
volumes.
 Performance improvement areas
• CPU optimization
• Reducing I/O
• Improved overlap and parallelism
• Elapsed time optimization (Not the same
 Focus Areas
• DATA STEP tuning
• New SAS9 features
Copyright © 2005, SAS Institute Inc. All rights reserved.
Session Outline
 Don’t forget the basics! - A ShortTuning Case
Study






DATA Step Views
PROC SUMMARY w/DATA Step View
DATA Step hash table functions
SAS Parallel Data Engine (SPDE)
SAS/Connect Pipes
Wrap-up
Copyright © 2005, SAS Institute Inc. All rights reserved.
Back to Basics:
High-Volume DATA Step Optimization
 Before implementing parallel operations, make
sure basic processing flow is efficient
 When processing high volumes of data, even
apparently small changes can have a large effect
 The following customer case study illustrates
several points.
Copyright © 2005, SAS Institute Inc. All rights reserved.
Program processes 36 million MXG TYPE74
records (436 CPU seconds 9672 G6)
DATA FILE.A;
SET INFILE1.TYPE74;
KOUNT = 1 ;
IF VOLSER = '.' OR VOLSER = ' '
THEN DELETE ;
IF SYSTEM = '888K' OR SYSTEM = '888Z' OR
SYSTEM = '888Q' OR SYSTEM = '888V' OR
SYSTEM = '888P' THEN DO ;
IF DATEPART(SYNCTIME) < '03APR04'D THEN
SYNCTIME = SYNCTIME - '06:00:00.00'T ;
IF DATEPART(SYNCTIME) > '02APR04'D THEN
SYNCTIME = SYNCTIME - '05:00:00.00'T ;
END ;
SYMNUM = 0 ;
IF DATEPART(SYNCTIME) < '17MAY04'D THEN DO ;
IF DEVNR > 58FFX AND DEVNR < 5FFFX THEN SYMNUM
= 111;
IF DEVNR > 6FFFX AND DEVNR < 7FFFX THEN SYMNUM
= 456;
IF DEVNR > 7FFFX
THEN SYMNUM = 234;
IF DEVNR => 5000X AND DEVNR < 5200X THEN SYMNUM
= 234;
IF DEVNR => 5FFFX AND DEVNR < 7000X THEN SYMNUM
= 876;
END;
IF DATEPART(SYNCTIME) > '17MAY04'D THEN DO ;
IF DEVNR > 4FFFX AND DEVNR < 7000X THEN SYMNUM
= 223;
IF DEVNR > 6FFFX AND DEVNR < 7FFFX THEN SYMNUM
= 456;
IF DEVNR > 7FFFX
THEN SYMNUM = 234;
END;
Copyright © 2005, SAS Institute Inc. All rights reserved.
TIPPCT = (IORATE * (AVGCONMS +AVGDISMS))/10 ;
FORMAT TIPPCT 5.2 ;
IF SYMNUM = 0 THEN DELETE ;
IO_1111 = 0 ; IO_4563 = 0 ; IO_234 = 0 ; IO_8765 = 0 ;
IO_22355 = 0 ;
IF SYMNUM = 1111 THEN IO_1111 = IORATE ;
IF SYMNUM = 4563 THEN IO_4563 = IORATE ;
IF SYMNUM = 234 THEN IO_234 = IORATE ;
IF SYMNUM = 8765 THEN IO_8765 = IORATE ;
IF SYMNUM = 22355 THEN IO_22355 = IORATE ;
DATE = DATEPART(SYNCTIME) ; FORMAT DATE DATE7. ;
INTE = TIMEPART(SYNCTIME) ; FORMAT INTE TIME19.2 ;
EMCTYPE = 'ESCON' ;
IF SYMNUM = 22355 THEN EMCTYPE = 'FICON' ;
IF IORATE < 10
THEN DELETE ;
KEEP VOLSER DEVNR TIPPCT DATE INTE
SYMNUM IO_1111 IO_4563 IO_234 IO_8765
SYNCTIME IO_22355 EMCTYPE
IORATE
AVGRSPMS AVGIOQMS AVGPNDMS AVGCONMS AVGDISMS
AVGPNCHA AVGPNCUB AVGPNDEV AVGPNDIR
PCTDVCON PCTDVUSE KOUNT ;
Do filtering as early as possible
TIPPCT = (IORATE * (AVGCONMS +AVGDISMS))/10 ;
FORMAT TIPPCT 5.2 ;
IF SYMNUM = 0 THEN DELETE ;
IO_1111 = 0 ; IO_4563 = 0 ; IO_234 = 0 ; IO_8765 = 0 ;
IO_22355 = 0 ;
IF SYMNUM = 1111 THEN IO_1111 = IORATE ;
IF SYMNUM = 4563 THEN IO_4563 = IORATE ;
IF SYMNUM = 234 THEN IO_234 = IORATE ;
IF SYMNUM = 8765 THEN IO_8765 = IORATE ;
IF SYMNUM = 22355 THEN IO_22355 = IORATE ;
DATE = DATEPART(SYNCTIME) ; FORMAT DATE DATE7. ;
INTE = TIMEPART(SYNCTIME) ; FORMAT INTE TIME19.2 ;
EMCTYPE = 'ESCON' ;
IF SYMNUM = 22355 THEN EMCTYPE = 'FICON' ;
IF IORATE < 10
THEN DELETE;
KEEP VOLSER DEVNR TIPPCT DATE INTE
SYMNUM IO_1111 IO_4563 IO_234 IO_8765
SYNCTIME IO_22355 EMCTYPE
IORATE
AVGRSPMS AVGIOQMS AVGPNDMS AVGCONMS AVGDISMS
AVGPNCHA AVGPNCUB AVGPNDEV AVGPNDIR
PCTDVCON PCTDVUSE KOUNT ;
Copyright © 2005, SAS Institute Inc. All rights reserved.
Move to top of DATA
Step
 CPU Time reduction
67%
Additional Steps
 Put KEEP= as DATA SET option to bring in fewer
variables into the DATA step. Note: This
decreases CPU time, but not I/O time.
 Use IF-THEN-ELSE or SELECT instead of just
IF-THEN.
 Eliminated redundant DATEPART function calls.
 Cumulative CPU time reduction:
80%
Copyright © 2005, SAS Institute Inc. All rights reserved.
Final Step
 Move filtering of blank VOLSER and IORATE
<10 to WHERE clause DATA SET option.
 Total cumulative CPU time reduction:
86%
Net savings of 368 CPU
seconds
Copyright © 2005, SAS Institute Inc. All rights reserved.
The Value of CPU Time Reduction
 Always important on the mainframe because it is
inherently a multi-workload beast.
 Often considered unimportant (or less so
anyway) on Windows and UNIX platforms
because of dedicated nature of systems.
Elapsed time is often more important.
 Changing with increasing use of server
virtualization. Affects how many virtual servers
can run on a physical platform.
• Logical Partitions or Domains on UNIX systems
• Virtual Machines on Windows and Linux systems
Copyright © 2005, SAS Institute Inc. All rights reserved.
Some General Strategies for Improving
Processing of Large Data Volumes
 Reduce volume of data passed (e.g. keep only
required variables in intermediate files)
 Reduce number of data basses
 Eliminate or reduce use of non-linearly scalable
techniques such as sorting.
 Exploit memory
 Exploit processing overlap and parallelism
Copyright © 2005, SAS Institute Inc. All rights reserved.
Exploiting New SAS Features
 We’ll use two scenarios from common
processing challenges encountered when
processing transaction data for performance and
service level reporting purposes.
 The improvements made to the processing
strategy for these scenarios …..
• Reduce number of data basses
• Eliminate or reduce use of non-linearly scalable
techniques such as sorting.
• Exploit memory
• Exploit processing overlap and parallelism
Copyright © 2005, SAS Institute Inc. All rights reserved.
General Scenario Chrematistics
 Very high data volumes (millions of records, tens
or hundreds of Gigabytes
 Multiple summarizations desired
 Detail records retained only for exceptional
cases.
Copyright © 2005, SAS Institute Inc. All rights reserved.
Scenario One
 High-volume transaction data, say from web log,
CICS, DB2, SAP
 Desired summarized file for service level
management, accounting, performance and
capacity management.
 Not interested in keeping every detail transaction
record.
Copyright © 2005, SAS Institute Inc. All rights reserved.
DATA Step Views
 Can be used to eliminate
a data passes
 Runs two tasks in parallel,
but does not multiprocess
 In this case, eliminates
one pass of the data.
Copyright © 2005, SAS Institute Inc. All rights reserved.
data lib.a / view=lib.a;
infile ……;
input x ……;
run;
proc sort data=lib.a; by x;
run;
SAS DATA Step View caveats
 Can inhibit use of indexed I/O; Data Set Option
WHERE clause cannot use index with a DATA
Step view.
 DATA Step views are sensitive not only to SAS
release and version levels, but sometimes to
maintenance levels.
Copyright © 2005, SAS Institute Inc. All rights reserved.
DATA Step Views with Proc Summary
 Eliminate data passes
and saves disk space.
 Eliminate sort
 Can produce multiple
summarization data sets
in one pass
 Benefits from large region
size (enough to hold
crossings)
 SUMMARY in SAS 9.1
• Multithreaded
• Does not keep n-way in
memory unless needed.
Copyright © 2005, SAS Institute Inc. All rights reserved.
data lib.a / view=lib.a;
infile ……;
input a b x y……;
run;
proc summary data=lib.a;
CLASS statement;
TYPES statement;
OUTPUT statement(s);
run;
SAS9 Threaded Procedures






SORT
SUMMARY/MEANS
TABULATE
REPORT
SQL
REG, GLM, LOESS, DMREG,DMINE
Copyright © 2005, SAS Institute Inc. All rights reserved.
Scenario Two
 High Volume Event data (time-oriented (e.g.
ARM log)
 Transactions must be constructed from multiple
event records
• Type S – transaction start ( ID, start time, code, )
• Type E – transaction end ( ID, end time, CPU time)
Copyright © 2005, SAS Institute Inc. All rights reserved.
Data arrival pattern
Start 1
Start 2
End 1
(write out 1)
Start 3
End 2
(write out 2)
Start 4
Start 5
End 4
(write out 4)
End 5
(write out 5)
End 3 (write out 3)
Copyright © 2005, SAS Institute Inc. All rights reserved.
DATA Step Hash Table Support
(New in SAS9)
 Can replace lookup formats
 Can have entries dynamically added, modified,
and removed
 For this Scenario, use a Hash Table to
accumulate transaction records from start and
end events.
Copyright © 2005, SAS Institute Inc. All rights reserved.
data transactions view=transactions;
declare hash transactions();
transactions.defineKey("tr_id");
transactions.defineData("tr_start", "tr_code“);
transactions.defineDone();
input type @;
if type = 'S' then do;
input tr_id tr_code tr_start;
rc=transactions.add();
end;
else if type='E' then do;
input tr_id tr_end tr_cpu;
rc = transactions.find();
response = tr_end - tr_start;
output;
rc = transactions.remove();
end;
Copyright © 2005, SAS Institute Inc. All rights reserved.
The Scalable Parallel Data Engine (SPDE)
 New in SAS 9.1
 Included with BASE
 Available on all 9.1 platforms
 Advantages
• Parallel data loading and index creation
• Parallel reads and searches
• Uses multiple indices to resolve a search
Copyright © 2005, SAS Institute Inc. All rights reserved.
SPDE – Scalable Performance Data Engine
SAS® System
data
Scalable Performance Data Engine
metadata
data1
data2
data3
data4
index
Hybrid index
Bitmap/B-tree
Bitmap/B-tree
Copyright © 2005, SAS Institute Inc. All rights reserved.
SAS SPDE implementation on z/OS
 USS thread services
 USS directory-based file systems
• zFS
• hFS
• NFS file systems
 Exploitation
• Define file system
• Change LIBNAME engine specification
Copyright © 2005, SAS Institute Inc. All rights reserved.
SPDE data set allocation on z/OS
 NFS – follow same guidelines as for Open
Systems
 HFS – Use separate HFS file systems for DATA
and INDEX components; perhaps multiple for
DATA. Spread HFS’s across Shark (ESS 2105)
loops.
 zFS - No special considerations! Use multivolume zFS particularly if
• Storage system has Parallel Access Volumes (PAV)
• ESS 2105-800 has Arrays Across Loops feature
Copyright © 2005, SAS Institute Inc. All rights reserved.
Scalability – SAS 9.1
SAS Scalable Architecture in SAS
Foundation
Scalable Performance Data Access
SAS
Teradata Sybase
DB2
Oracle
Scalable SAS/ACCESS
Piping
Piping
SAS
SAS
SAS
CONNECT
CONNECT
CONNECT
CPU 1
CPU 2
Threaded Procedures
THREAD 1
THREAD 2
THREAD N…
Copyright © 2005, SAS Institute Inc. All rights reserved.
Remote
Host
MP Connect Pipes




New in SAS9
Uses TCP/IP socket engine
Superior to DATA Step View approach
Provides true multi-processing
Copyright © 2005, SAS Institute Inc. All rights reserved.
/* ----- DATA STEP - PROCESS P1 ------ */
/* ---- SUMMARY - PROCESS P2 ----- */
SIGNON P1 SASCMD='!SASCMD';
RSUBMIT P1 WAIT=NO;
LIBNAME OUTLIB SASESOCK ":PIPE1";
SIGNON P2 SASCMD='!SASCMD';
RSUBMIT P2 WAIT=NO;
LIBNAME INLIB SASESOCK ":PIPE1";
data outlib.transactions;
declare hash transactions();
transactions.defineKey("tr_id");
transactions.defineData("tr_start", "tr_code“);
transactions.defineDone();
proc summary data=inlib.transactions;
CLASS statement;
TYPES statement;
OUTPUT statement(s);
run;
input type @;
if type = 'S' then do;
input tr_id tr_code tr_start;
rc=transactions.add();
end;
PROC PRINT;RUN;
ENDRSUBMIT;
WAITFOR _ALL_ P1 P2;
else if type='E' then do;
input tr_id tr_end tr_cpu;
rc = transactions.find();
response = tr_end - tr_start;
output;
rc = transactions.remove();
end;
ENDRSUBMIT;
Copyright © 2005, SAS Institute Inc. All rights reserved.
In Summary……
 Remember the importance of basic SAS
program tuning skills which have been wellknown for years.
 Take advantage of the significant SAS9 features
which can help you
• Improve response and turnaround times
• Improve availability times for BI applications by
shortening the batch window.
• Reduce costs by cutting resource consumption and
utilizing the most effective combination of CPU,
memory, and I/O resources
Copyright © 2005, SAS Institute Inc. All rights reserved.
Copyright © 2005, SAS Institute Inc. All rights reserved.
31