Batch Processing How- To Stefan Rufer, Netcetera Matthias Markwalder, SIX Card Solutions
Transcription
Batch Processing How- To Stefan Rufer, Netcetera Matthias Markwalder, SIX Card Solutions
Batch Processing How- To Or the “The Single Threaded Batch Processing Paradigm” Stefan Rufer, Netcetera Matthias Markwalder, SIX Card Solutions 6840 2 Speakers > St efan Rufer – St udied business IT at t he Universit y of Applied Sciences in Bern – Senior Soft ware Engineer at Net cet era – Main int erest: Server side application development using JEE > Mat thias Markwalder – Graduat ed from ETH Zurich – Senior Developer + Framework Responsible at SIX Card Solutions – Main int erest: High performance and qualit y batch processing 3 Why are we here? > Let 's learn how t o bake an omelet . 4 AGENDA > What do we do > Sharing our ex perience > Wrap up + Q&A 5 What do we do > Credit / debit card t ransact ion processing > Backoffice bat ch processing application 24x 7x 365 > 1.7 Mio card t ransact ions a day > Volume will double by end of 2010 be ready… > Migrat ed from Fort é UDS to JEE > More agile code base now 6 How do we do it > Transact ional int egrit y at any time > Custom batch processing framework (not Spring Bat ch) > 1 controller builds the jobs 35 workers process t he steps of jobs (or as many as you want and your syst em can take) > 1 application server (12 cores) > 1 dat abase server (12 cores, 1.5TB SAN) 7 Batch Processing Basics > It‘s simple, but parallel: – Read file(s) – Process a bit – Write file(s) > Terminology from Spring Bat ch 8 AGENDA > What do we do > Sharing our ex perience > Wrap up + Q&A 9 Bake an omelet > 200g flour, 3 eggs, 2 dl milk, 2 dl wat er, ½ table spoon salt > St ir well, wait 30min ( ) > St ir again > Put litt le but ter in heated pan > Add 1dl dough > Bake until slightly brown, flip over, bake again half as long > Put cheese / marmalade / apfelmus / ... on top, fold > Enjoy 10 Jobs run in parallel Mot ivat ion > Load balancing Ex ample > Complete yest erdays reports while doing today's business How to achieve > Use bat ch scheduling applicat ion t hat cont rols your entire processing. > Read/ modify categorization of jobs 12 Load limitations Mot ivat ion > Load balancing Ex ample > Generate 70 reports, but max 20 in parallel How to achieve > Number of workers one job can use > Priorit ies of t he steps of a job 13 Decouple controller + workers Mot ivat ion > Scalabilit y Ex ample > SETI@hom e 14 Step trees, Sequential, Fail on Exception Mot ivat ion > Avoid structuring st eps in code Ex ample > writ e a file. Collect dat a, afterwards How to achieve > Sequent ial ex ecution > Fail on ex ception (rollback entire st ep) Step trees, Parallel, Continue on Exception Mot ivat ion > Minimize work left Ex ample > Process 30'000 t ransactions in 3 steps. How to achieve > Parallel ex ecut ion > Continue on ex ception (still rollback ent ire st ep) 15 16 Parallelize reading Mot ivat ion > Speedup Ex ample > A file of 200'000 credit card aut horisat ions and transact ions have t o be read into database. How to achieve > Cut input file in pieces of 10'000 lines each. – bt w: perl, sort are unbeat en for this... > Process each piece in a parallel st ep. 17 Parallelize processing Motivation > Speedup Ex ample > Summarize accounting data and store result in database again. How to achieve > Group data in chunks of 10'000 and process each chunk in a parallel step. > Choose grouping criteria carefully: – No overlapping data areas – Pass along data that you had to read for the grouping process 18 Parallelize processing – how to group Motivation > Structuring your data in parallelizable chunks > Load balancing Ex ample > Parallelize processing by client as data is distinct by design. How to achieve > Group by client > Group by keys: Ranges or ids – Ranges (1..5) can grow very large – Keys (1, 2, 3, 4, 5) can become very many 19 Parallelize writing Mot ivat ion > Transact ional int egrit y while writing files. > Easy recovery while writing files. Ex ample > Collect dat a for the payment file. How to achieve > Collect dat a in parallel and writ e t o a staging t able. > St aging t able cont ent very close t o target file format. > In a last st ep dump ent ire cont ent of staging t able t o file. 20 Different processes write in parallel Mot ivat ion > Don't lock out each ot her Ex ample > Account informat ion changes while account balance grows. How to achieve > No opt imist ic locking > Modify delt as on sums and count ers > Keep dist inct fields for different parallel jobs > Be aware of deadlock pot ent ial 21 Avoid insert and update in same table and step Mot ivat ion > Speedup > Avoid DB locks Ex ample > Summary rows in same t able as t he raw dat a. How to achieve > Normalize your database. 22 Let the database work for you Mot ivat ion > Simple code > Speedup Ex ample > Sorting or joining arrays in memory. How to achieve > Code review. > Book SQL course. 23 Read long, write short Motivation > Keep lock contention on database minimal > Keep transactional DB overhead minimal Ex ample > Fully process the whole batch of 1‘000 records before starting to write to DB. How to achieve > 1 (one) "writing" database transaction per step. interface IModifyingStepRunner { void prepareData(); void writeData(); } 24 This omelet did not taste like grandma's! > Despite following the recipe, there are the hidden corners > Let's have a look at some pitfalls 25 Don't forget to catch Error Motivation > Application integrity delegated to DB Ex ample > OutOfMemoryError caused half of a batch to be committed. Fatal as rerun can not fix inconsistency. How to fix try { result = action.doInTransaction(status); } catch (Throwable err) { transactionManager.rollback(status); throw err; } transactionManager.commit(status); 26 Use BufferedReader / BufferedWriter Mot ivat ion > Speedup (file reading t ime cut in half) Ex ample > Forgot t o use BufferedReader in file reading framework. How to fix > Code review. > Profile if performance "feels not right ". 27 Use 1 thread only Mot ivat ion > Simplicity for t he programmer > Safet y (no concurrent access) Ex ample > Singlet on, synchronized blocks, st at ic variables, st at eful st ep runners – we had it all... How to achieve > Configure fram ework t o use one JVM per worker. 28 Cache wisely Mot ivat ion > Speedup > Limit memory use Ex ample > Tax rat es do not change during a processing day, cache it long. > Customer data will be reused if processing transact ion of same customer – cache it short . How to achieve > Cache per worker > Cache lifetimes: Worker / step / on demand 29 Support JDBC batch operations Mot ivat ion > Speedup Ex ample List<Booking> bookings = new ArrayList<Booking>(); ... bookingDao.update(bookings); How to achieve > Enhance your database layer wit h a built - in JDBC bat ch facilit y. > Ex ecut e bat ch after 1000 it ems added. > Autom at ically re- run failed batch using single JDBC st at ement s 30 Structured patching Mot ivat ion > Risk management > St ay agile in product ion Ex ample > Bug found, fix ed and unit test ed. Deploy t o product ion asap. How to achieve > Eclipse- wizard to creat e pat ch (all files involved to fix a bug) > Pat ch- script t hat applies .class file/ SQL script/ whatever... 31 Never, ever, update primary keys Mot ivat ion > Good database design > Speedup Ex ample > Homem ade library always wrot e ent ire row t o dat abase. How to fix > Only writ e changed fields (dirt y flags). > Make primary keys immut able on your object s. 32 AGENDA > What do we do > Sharing our ex perience > Wrap up + Q&A 33 Future > Scalabilit y is an issue with a single database server. – Partit ioning opt ions used, but not t o t he end. – Will Moore's law save us again? > Processing double the volume st ill t o be proven... 34 If you remember just three things... Java batch processing works and is cool :- ) Trade- offs: > Do not stock the work, start. > Single threaded, many JVMs. > Designing for scalability, stability needs experts. http:/ / www.google.ch/ search?q= how+ to+ flip+ an+ omelet Stefan Rufer stefan.rufer@netcet era.ch Netcetera AG www.netcet era.ch Matthias Markwalder group.com matt hias.markwalder@six - SIX Card Solutions www.six - group.com 36 Links / References > htt p:/ / en.wikipedia.org/ wiki/ Batch_processing > htt p:/ / stat ic.springframework.org/ spring- bat ch/ > htt p:/ / www.bmc.com/ product s/ offering/ control- m.html > htt p:/ / www.javaspecialists.eu/ And to really learn how to bake fine omelets, buy a book: > htt p:/ / de.wikipedia.org/ wiki/ Marianne_Kalt enbach > htt p:/ / www.oreilly.de/ catalog/ geeksckbkger/ 37 Other batch processing frameworks (public only) > http:/ / www.bmap4j.org/ > http:/ / freshmeat.net/ projects/ jppf > http:/ / hadoop.apache.org/