ElasticSearch Log3
Transcription
ElasticSearch Log3
Experiences*in*Traffic*Logging*and* Visualization*with*ELK*and*D3.js Surasak Sanguanpong Department of Computer Engineering Faculty of Engineering, Kasetsart University U-Bahn Station Candidplazt, Munich, Germany Tech% Talk% Session,%WUNCA% 33 rd Chulalongkorn University,% July% 14%2016 In This Talk About%Traffic%Log Search%Platform% with%ELK Real%Time% Visualization with%D3.js Lessons%Learnt 2 Log Monitoring Collecting Processing Analysing Visualising Image:% https://www.flickr.com/photos/sbeebe/4772418919 3 At What Scale? Hmm..Large.. 4 http://www.24hourcampfire.com/ubbthreads/ubbthreads.php/topics/5976731/all/That_s_a_load_of_l Traffic Logging Solution Splunk? Great, but..commercial, proprietary Graylog? Excellence, but too automatic Elasticsearch, Logtash, Kibana, D3 That is!, a lot of fun to play 5 Chapter I Log Architecture and Raw Log Management: A Case Study 6 Evolution of KU Traffic Logging Design 2008-2015 2015- Simple GUI Kibana/D3 MySQL Elasticsearch Raw Log Raw Log 7 Logging Architecture Login Log Network Mirror packets Logging Engine Login Web Log Search GUI Login/ Logout Packet Log 8 Login Log Format Date Time Action IP UserName LogServer Jul 1 10:04:57 login 158.108.X.X XXXXX@ku.ac.th 192.168.1.1 Jul 1 10:04:58 logout 158.108.X.X YYYYY@ku.ac.th 192.168.1.2 Jul 1 10:04:59 timeout 158.108.X.X ZZZZZ@ku.ac.th 192.168.1.2 9 Web Log Format UnixTime SrcIPv4 SrcIPv6 DstIPv4 DstIPv6 SrcPort DstPort URL Referer/HTTPS 20151103010000 192.55.X.X - 158.108.X.X - 17490 80 mirror1.ku.ac.th/fedoraepel/6/i386/jday-devel-2.4-5.el6.i686.rpm http://mirror1.ku.ac.th/fedoraepel/6/i386/ 20151103010000 10.X.X.X - 203.104.175.X - 62635 80 sg-nvapis.line.me/ ping?&msgpad=1446487199964&md=9LMRXqv1Nb8P07aj0Vo%3D – 20151103010000 - 2406:3100:1018:1::XX - 2600:1417:a::174c:XX 61154 443 fbcdn-photos-g-a.akamaihd.net HTTPS 20151103010000 - 2406:3100:1018:1::XX - 2a03:2880:f002:105:fa:b0:0:YYXX 59960 443 edge-mqtt.facebook.com HTTPS 10 Packet Log Format (Header Log) TimeStamp SrcIP DstIP SrcPort Proto Size DstPort SrcPort [Flag] 2009-07-16 17:53:59.999206 208.117.8.X 158.108.234.X 1514 TCP 80 1371 0x10 2009-07-16 17:53:59.999209 158.108.2.X 202.143.136.X 90 UDP 123 123 TimeStamp SrcIP DstIP Proto Code 2009-07-16 17:53:59.999210 158.108.184.X 218.164.54.X ICMP 168 11 Example of Log Folder Time based Hierarchical Folder Year Month Day Hour 00 01 01 2015 02 : 01 : 02 : : 23 30 Minutes%File 201501010000.txt 201501010001.txt : 201501010059.txt 201501012300.txt 201501012301.txt : 201501012359.txt 12 12 Minutely HTTP Log 11"days"(11x"24x60="15,640"data"points) 13 Request Rate and Log Sizing 14 Accumulated Log Request and Size 20M 14.1B 2.04"GB 2.57"TB #Files":"120 #Files":"172,800 3.27T 28.03"TB #Files":"172,800 15 Log Processing and Search Services • On the fly Text based Log to MySQL converter • Slow processing/ searching time • Simple Search 16 Chapter II ELK Stack Testbed 17 What is the Elasticsearch? Real\time Search/Analytic Engine%SW Document\ Oriented REST%API & JSON JAVA/ Lucene based Distributed Scalable Plugin Architecture Open"Source Apache" 2"License REST:%Representational%State%Transfer JSON:%JavaScript%Object%Notation 18 What does Elasticsearch offer? Full%Text%Search Very%Fast Fault%Tolerance High%Availability 19 How the world is using Elasticsearch? Full-text search with highlighted search snippets Providing search across GitHub's code Analytics solution on 40 million documents per day to deliver real-time visibility Full-text search to find related questions and answers 20 Elasticsearch and Big Data ES-Hadoop: Connectivity of Hadoop's big data analytics and the real-time search of Elasticsearch. https://www.elastic.co/products/hadoop 21 ELK stack from Elastic Logtash: Log transport and processing daemon Elasticsearch: Highperformance scalable search engine Kibana: Visualisation dashboard ELK Stack 22 Logtash Log aggregator and parser Transferring parsed data to Elasticsearch Configuration file for specifying input, filtering (parsing) and output input%{%stdin {%}%} filter% {%% grok {%%% match%=>% {%"message"% =>%"%{COMBINEDAPACHELOG}"% }%% }%% date%{%%%%match%=>% [%"timestamp"% ,"dd/MM/yyyy:HH:mm:ss"% ]%% } } output%{%%Elasticsearch {%hosts%=>% ["localhost:9200"]%}%% stdout {%codec% =>%rubydebug }} 23 Kibana General purpose query UI Includes many widgets Query Elasticsearch without coding 24 Alternative Stack ELK EFK 25 Elasticsearch Indexing Performance • Xeon E3-1271v3 3.6 Ghz 4C/8T • 32 GB RAM • 2x6 TB NLSAS • Elasticsearch 2.3.2 • 10 Shards/0 Replica • Hyper-threading off • Web Log Indexing 250 #Records Records/s 200 45 44 43 THOUSANDS • Single Dell R220 MILLIONS Daily**Performance*Indexing 42 150 41 40 100 39 38 50 37 36 0 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Search Performance Search keyword: “ face” against each daily log 3.50 3.33 3.00 3.00 2.00 1.50 2.33 2.01 22,816 1.99 18,218 16,346 16,240 17,551 23,559 2.00 2.67 30,000 2.43 25,405 2.00 25,000 2.14 22,092 18,054 17,683 1.33 15,000 9,127 7,958 10,000 8,221 5,622 5,000 1,886 1 2 3 4 5 6 7 20,000 12,951 12,343 1.02 0.50 40,000 35,000 2.67 28,259 2.33 2.13 3.33 33,528 2.67 1.00 0.00 3.00 2.67 2.50 SEARCH"TIME"(MS) Not yet Optimization Search "Performance"an d"Hits 8 9 10 11 Search%Time%(ms) 12 13 14 15 16 17 18 19 0 Hits 27 Kibana: Main Dashboard 28 Kibana : per IP Log 29 Kibana: Login Profile 30 Kibana: Concurrent Login View 31 Chapter III Playing with D3.js 32 Real Time Visualization with D3.js • Data-Driven Documents (D3) • JavaScript library for manipulating documents based on data • Developed by Mike Bostock https://d3js.org/ 33 D3 Architecture ! Input data to build visualizations (JSON, CSV,…) ! Data manipulation of HTML elements dynamically with JavaScript node.js socket.io 34 Sample Gallery 35 Real-time makes impression Norse%Live% Attack%Map% http://map.norsecorp.com/#/ 36 Concurrent Login 37 IP Matrix Occupied 38 Tree Map Web Access 39 Traffic Connectivity 40 Chapter IV New Log Design 41 New Logging Architecture Login Log Network Mirror packets Logging Engine Elasticsearch Real time Indexing Web Log Login Login/out event DHCP, RADIUS Session" Tracking"& Accounting Flow Log Elasticsearch GUI/ Analytics 42 Logging Redesign User" identification Legal"Logging Real^time Accounting User Session Control Traffic Analytics SIEM Supports Performance Management 43 New*Login*Log*Format • Real-time logging, one file per day • Fields login_session_id user login_timestamp logout_timestamp mac_address ipv4 ipv6 agent_ip agent_type via_ip ipv4_byte_in ipv4_byte_out ipv4_pkt_in ipv4_pkt_out ipv6_byte_in ipv6_byte_out ipv6_pkt_in ipv6_pkt_out • Sample Log 67686345 user1@domain.com login – 0 0 0 0 0 0 0 0 67686346 user2@domain.com 67686345 user1@domain.com 2001:db8::1 203.0.113.5 login 1467551484.163681 0 001122334455 192.0.2.1 2001:db8::1 203.0.113.5 1467551490.524125 0 - 192.0.5.5 - 203.0.113.1 login – 0 0 0 0 0 0 0 0 1467551484.163681 1467551833.754636 001122334455 192.0.2.1 – 234342 423442 5522 6622 233456 22334 445 665 New*Web*Log*Format • Real-time logging, one file per minute • Fields request_timestamp {flow link fields} {login link fields} {ip info fields} {tcp info fields} method host path referrer agent • Sample Log 554455 1467551484.180000 67686345 user1@domain.com 1467551484.163681 4 192.0.2.1 198.51.100.1 tcp 5566 80 GET www.domain.com /index.html - “Linux” Traffic*Flow*log • Log commit periodically (Configurable 1 minute to 1 hour interval) • Fields • flow_id flow_start_timestamp {segment info fields} {login link fields} • {ip info fields} {tcp info fields} {tcp additional info fields} {tcp stat fields} • Sample Log 554455 1467551484.180000 1467551484.180000 1467551492.954258 18 20 1628 25456 223344 f 67686345 user1@domain.com 1467551484.163681 4 192.0.2.1 198.51.100.1 tcp 5566 80 1 - 1428 1428 864 24522 3 17 2 2 0 30000 0 30000 Chapter V Lessons Learned 47 Lessons Learned Elasticsearch offers a very fast full-text search services Indexing size may 3x to 5x bigger than source data Use Elasticsearch for search services, not for data archiving 48 Lessons Learned Logtash : A powerful tool to manipulate log Kibana : Simple and useful for visualize data 49 Lessons Learned D3 pros Flexible, Facsinating Visualization D3 cons Low Level, Steep Learning Curve, CPU intensive 50 Lessons Learned Combination of Lawful Log, Security information and event management (SIEM) and Accounting 51 Kasom Koth-Arsa Thank you for your attention Core Log Design and Development Jautuporn Chuchuay Peerapol Boonthaganon Web GUI Development Q&A… Sataporn Techaaramwong Web/Elasticsearch Development Peerapong Thongpubeth Jiradech Sirijantadilok Kibana Development Poomipat Thongudom Nichapat Nattee Q & A Time D3 Development Surachai Chitpinijyol Project Coordinator Surasak Sanguanpong Project Director Sunset at Narita Airport Special Thanks to Kasetsart Office of Computer Services for supporting traffic data 52
Similar documents
Office Profile - Integra Realty Resources
includes over 30 years of consultation and valuation analysis for the general public on commercial and residential properties. Recent experience is concentrated in major urban and suburban developm...
More information