Data Security - Big Data Everywhere
Transcription
Data Security - Big Data Everywhere
Data Security as a Business Enabler – Not a Ball & Chain Big Data Everywhere May 21, 2015 Les McMonagle Protegrity - Director Data Security Solutions Les has over twenty years experience in information security. He has held the position of Chief Information Security Officer (CISO) for a credit card company and ILC bank, founded a computer training and IT outsourcing company in Europe and helped several security technology firms develop their initial product strategy. Les founded and managed Teradata’s Information Security, Data Privacy and Regulatory Compliance Center of Excellence and is currently Director of Data Security Solutions for Protegrity. Les holds a BS in MIS, CISSP, CISA, ITIL and other relevant industry certifications. Les McMonagle (CISSP, CISA, ITIL) Mobile: (617) 501-7144 Email:Les.McMonagle@Protegrity.com 2 The Problem . . . The cost of cybercrime is staggering: 3 • The annual cost to the global economy is in excess of $400 billion/year. • Businesses that are victims of cybercrime need an average of 18 days to resolve the problem and suffer average costs of over $400K. • The tangible and intangible costs associated with some of the recent high-profile cases exceeds $400M. • Traditional network security, firewalls, IDS, SIEM, AV and monitoring solutions do not offer the comprehensive security needed to protect the target data against current, new and evolving threats. Typical Phases of an Attack 4 http://eval.symantec.com/mktginfo/enterprise/white_papers/b-anatomy_of_a_data_breach_WP_20049424-1.en-us.pdf Factors to Consider " Bad guys search for the easy targets • Large repositories of valuable, un-protected data • Systems with weaker controls and/or more access paths • Financial Data or Personally Identifiable Information (PII) " Blurring or Network Boundaries • Where does your company network end and another begin? • BYOD • Cloud • IoT (Internet of Things) " Insider threats remain the biggest threat " Advanced Persistent Threats (APTs) • Coordinated, comprehensive attack strategies 5 Types of Sensitive Data Potentially Stored in Hadoop Credit Card PAN SSN DOB Bank Account Numbers Customer Lists PIN Best Practices Pending Patents Trade Secrets Health History Order History Health Records Accounts Receivable Accounts Payable Production Planning Prescriptions Employee Personnel Records 6 Home Addresses Location Data Passwords Sales Forecasts Payroll Data R&D Customer Contact Information Income Data Salary Data Project Plans What to do about it " Engage Information Security " Work with Legal and Compliance " Establish Good Data Governance Program " Adhere to generally accepted privacy principles * " Apply consistent protection throughout the data flow " Limit access on a Need-to-Know basis " Protect the actual data itself (regardless of where it is) " De-Identify data ─ without losing analytics value 7 * See reference slide(s) at end of presentation Engage InfoSec, Legal, Compliance, Privacy " Engage Information Security – rather than avoid them " CISO’s and InfoSec ultimately have the same goals " Will help fund and implement effective data protection " Legal, Privacy and Compliance • Identify/interpret regulatory and compliance requirements • Helping protect the business by identifying risks to consider • Incorporate generally accepted Privacy Principles* 8 * See reference slide(s) at end of presentation Data Governance Program " Establish good data governance program • Identified Data Owners • Identified Data Stewards • Identified Data Custodians • RACI – Roles and Responsibilities " Data Governance subject areas • Data Ownership • Data Quality • Data Integration • Metadata Management • Master Data Management • Data Architecture • Data Security & Privacy 9 Protect sensitive data consistently wherever it goes At Rest In Transit In Use Ideally with a single, centralized enterprise solution 10 What Data to Tokenize or Encrypt ? " Important questions to ask . . . • What policy and regulatory compliance requirements apply? • What risks must be mitigated? • How/Why are protected columns accessed/used? • What other mitigating controls are available? • Appropriate balance between business and data privacy/security? • When is Tokenization or Encryption most appropriate? " Utilization and access control limitations of Hadoop / Hive " Alternative protection options to consider • Full Disk Encryption (FTE) Important Data Security Architecture Questions To Encrypt or Tokenize . . . This is the Question Tokenization Encryption SSN PIN, CID, CV2 Password Large - Field Size relative to width of lookup table - Small CC-PAN More - X-Ray Structured - Less Cat Scan Healthcare Records More - Logic in portions of the data element Patient ID # Less - HIV-Pos* - Less Diagnosis report Bank Acct No. Percent of Access Requiring Clear Text - More Customer ID # DOB Increasing Data Sensitivity * With Initialization Vector (IV) Potential Additional Controls to Consider " Tokenization or Encryption farther upstream in Data Flow " Do not load unnecessary regulated data to Hadoop " Access Hadoop Hive Tables through Teradata (QueryGrid) " HDFS file-level access control " Accumulo cell level access control (Row/Column intersection) " Knox Gateway (authentication for multiple Hadoop clusters) " Coarse grained HDFS File Encryption " XASecure (now HDP Advanced Security) " Ambari (Hadoop Cluster Management) " Kerberos (Authentication) – all or nothing Piecemeal independent security tools for Hadoop Reduce your Exposure and Risk Population of users who have access to SSN today SSN Token Population of users who can perform their job function with only the last 4 digits of the SSN Vaultless Tokenization is a form of data protection that converts sensitive data into fake data. The real data can be retrieved only by authorized users. SSN Last 4 Digits Often a more usable form of protection than encryption. SSN Full Population of users who need access to the full SSN to perform their job function Improve Security Posture Without Impacting Analytics Value 14 What to look for in a good Enterprise Solution Critical core requirements: v A single solution that works across all core platforms v Scalable, centralized enterprise class solution v Segregation of duties between DBA and Security Admin v Good Encryption Key or Token Lookup Table management v Data layer solution v Tamper-proof audit trail v Transparent (as possible) to authorized end-users v High Availability (HA) v Optional in-database versus ex-database encryption/tokenization 15 Other "nice to have" features " Flexible protection options (Encrypt, Tokenize, DTP/FPE, Masking) " Broadest possible support for a range of data types " Built in DR, Dual Active, Key and system recovery capability " Minimal performance impact to applications/end users " Optimized operations to minimize CPU utilization " Proven Implementation methodology " PCI-DSS compliant solution (meeting all relevant requirements) " Deep partnership with Teradata and other database providers " Minimal impact on system upgrades " Maintain consistent referential integrity and indexing capability " Low Total Cost of Ownership (TCO) 16 What to look for in a good solution for Hadoop " Course Grained and Fine Grained Protection Capability • HDFS File Encryption, Multi-Tennant File Encryption, HDFS FP (HDFS Codec) • Column/Field Level “Fine Grained” Protection " Multi-Tennant Row Level Protection • Allow authorized users access to specific rows only • Unprotect columns for authorized users only " Heterogeneous Protection Capabilities • Protect Upstream sources of data and Downstream targets of data • Vaultless Tokenization, often less intrusive than encryption, reversible protection • Reversible – where masking is not • Deployed on the (Data) Nodes • Leverage MPP architecture of Hadoop • Avoid Appliance based solutions that can slow down Hadoop " Tokenization capability for Hive access to HDFS Files/Tables • 17 Hive does not support VarByte data type (Encryption = Binary Ciphertext) Hadoop security controls are playing catch-up Traditional RDBMS Firewalls, IDS/IPS Hadoop (Fewer Layers) Firewalls, IDS/IPS Authentication (Kerberos) Authorization Authentication (Kerberos) Future ? RBAC (Accumulo, Knox) RLS CLS Audit Hive HDFS RDBMS Encrypt Tokenize Heavier reliance on Tokenization with Hadoop 18 Tokenize Only Granularity of Protecting Sensitive Data Coarse Grained Protection (File/Volume) Fine Grained Protection (Data/Field) • Methods: File or Volume encryption • Operates at the individual field level • “All or nothing” approach • Fine Grained Protection Methods: • Vaultless Tokenization • Masking • Encryption (Strong, Format Preserving) • Does NOT secure file contents in use • OS File System Encryption • Data is protected in use and wherever it goes • HDFS Encryption • Business logic can be retained • Secures data at rest and in transit Data Security Platform Applications RDBMS Audit Log Audit Log Enterprise Security Administrator EDW Audit Log Policy Big Data Audit Log File and Cloud Gateway Servers IBM MainframeAudit Protector Log 20 Audit NetezzaLog Audit Log File Servers Protegrity Confidential Protection Servers Protegrity’s Big Data Protector for Hadoop Hadoop Node Hadoop Cluster Hive Policy Pig Other MapReduce YARN HBase Audit HDFS OS File System " Protegrity Big Data Protector for Hadoop delivers protection at every node and is delivered with our own cluster management capability. " All nodes are managed by the Enterprise Security Administrator that delivers policy and accepts audit logs " Protegrity Data Security Policy contains information about how data is deidentified and who is authorized to have access to that data. " Policy is enforced at different levels of protection in Hadoop. 21 Rich Security Layer over the Hadoop Ecosystem • UDF Support for Pig • UDF Support for Hive • Hive - Tokenization • Java API Support for MapReduce • Hbase - Coprocessor support via UDFs • Cassandra – UDT • HDFS Encryption through the HDFS Codec • HDFS Commands Extended for Security Functions • HDFS Interface for Java Programs • De-identify before Ingestion into HDFS • OS File System Encryption; Folder/File or Volume 22 Pig / Hive MapReduce YARN HBase HDFS File System Coarse Grained Protection: File / Volume Encryption All fields are in the clear All fields are in the clear Pig / Hive MapReduce YARN HBase HDFS File with identifiable Entire File is data elements Encrypted File System Volume encryption option will encrypt the entire volume versus the files themselves. 23 Coarse Grained with HDFS Staging Area Pig / Hive MapReduce Jobs MapReduce YARN HBase HDFS Ingest into HDFS Staging Area File System 24 Coarse Grained Multi-Tenant Protection Pig / Hive MapReduce YARN HBase T1 T1 folder T2 25 T3 folder clear folder HDFS Ingest into HDFS Key 1 T3 T2 folder Key 2 Key 3 File System Fine Grained Protection Production Systems Encryption • Reversible • Policy Control (authorized / Unauthorized Access) • Lacks Integration Transparency • Not searchable or sortable • Complex Key Management • Example: !@#$%a^.,mhu7///&*B()_+!@ Vaultless Tokenization / Pseudonymization • Reversible • Policy Control (Authorized / Unauthorized Access) or • Not Reversible • No Complex Key Management In either case • Integrates Transparently • Searchable and sortable • Business Intelligence: 0389 3778 3652 0038 Non-Production Systems Masking • Not reversible • No Policy, Everyone Can Access the Data • Integrates Transparently • No Complex Key Management • Example: Date of Birth 2/15/1967 masked as xx/xx/1967 Protegrity Confidential Enterprise-wide Protection Source Systems (Internal / External) Consumption BI Systems Target Systems (Internal / External) Input File Source FPG Node Node Node Input File Source Ecosystem Components Pig ETL Hive MapReduce YARN HBase Database Server HDFS Database Protector Database Sqoop OS FS Edge Node Java Program File Protector Application Protector If Edge Node is a Hadoop Node, Hadoop resources can be used ESA Policy Deployment Audit Collection Downstream Systems Traditional IT Environment: Protegrity Protection Typical Enterprise Today Internet Inside the Firewall Apps EDW Files DBs Hadoop Apps Arch 028 Protegrity Confidential Today’s IT Environment: Protegrity Protection Typical Enterprise Today Internet Inside the Firewall Apps Cloud Protector Gateway Files Files DBs File Protector Gateway EDW Apps Arch 029 ESA Protegrity Confidential Hadoop HG Apps Summarize what to do " Establish Good Data Governance " Protect the actual data Itself " Maintain referential integrity " De-Identify data ─ while maintaining analytics capability " Apply consistent protection throughout the data flow " Engage Information Security, Legal and Compliance Build security in rather than bolt it on later 30 Sign Up for a Free, Half-Day Risk Assessment Workshop Protegrity is proud to offer free, half-day risk assessment workshops designed to help companies evaluate their security posture. This is a no-obligation offer. These workshops are a unique, low-cost opportunity to gain valuable insight into where you stand from a risk management perspective relative to your peers. For more information or to schedule a free half-day workshop, please email: info@protegrity.com 31 The End . . . Q&A Convergence of Data Privacy Regulations • Government and industry groups are regularly releasing new data privacy laws, requirements, recommendations • Each leverages the best of previous privacy laws and discards what has proven not to work • New regulations and standards are converging on a standard set of data privacy principles • The International Security, Trust and Privacy Alliance (ISTPA) has published a comparison of leading privacy Privacy Principles – One 1/2 " Accountability – requires that the entity define, document, communicate, and assign accountability for its privacy polices and procedures and be accountable for PII under its control. " Notice – requires that the entity provide notice about its privacy policies and procedures and identify the purpose for which personal information is collected, used, retained, and disclosed. " Choice and Consent – requires that the entity describe the choices available to the individual and obtain implicit or explicit consent with respect to the collection, use, and disclosure of personal information. " Collection Limitation – requires that the entity collect personal information only for the purposes identified in the notice. " Use Limitation – requires that the entity limit the use of personal information to the purpose identified in the notice and for which the individual has provided implicit or explicit consent. Comparable lists from: International Security, Trust and Privacy Alliance (ISTPA) Association of Insurance Compliance Professionals (AICP) Privacy Principles – Two 2/2 " Access – requires that the entity provide individuals with access to their personal information for review and update. " Disclosure – requires that the entity disclose personal information to third parties only for the purposes identified in the notice and only with the implicit or explicit consent of the individual. " Security – requires that the entity protect personal information against unauthorized access or alteration (both physical & logical). " Data Quality – requires an entity maintain accurate, complete, and relevant personal information for the purposes identified in the notice. " Enforcement – requires that the entity monitor compliance with its privacy policies and procedures and have procedures to address privacy-related inquiries and disputes. These must be captured in business/technical requirements Plethora of Global Privacy Regulations Legislation and Regulations European Union – 95/46/EC Directive on Data Privacy Germany – Federal Data Protection Act Sweden – Personal Data Act United Kingdom – Data Protection Act Australia – Privacy Act Japan – Personal Information Protection Act United States – SOX, GLBA, HIPAA, COPPA, SB 1386 36