PHEMI Central Datasheet
Transcription
PHEMI Central Datasheet
D ATA S H E E T PHEMI Central Big Data Warehouse PHEMI Central™ is a big data warehouse that takes advantage of the power, scalability, and flexibility of Hadoop while providing fully integrated privacy, security, governance, and data management—all built right in. Drive discovery and fuel innovation with big data economics, while meeting compliance and governance objectives. Take Advantage of Enterprise-Grade Hadoop to Unlock the Value in Your Data PHEMI Central provides: • The ability to collect, curate, and consume any volume and variety of data Collect Consolidate all of your data — structured and unstructured • High-speed data ingestion and processing to support real-time operations and business intelligence applications • Full data control and lifecycle management • Built-in Privacy by Design to enable collaboration while protecting sensitive and private information • The ability to work with big data technologies without an army of Hadoop programmers PHEMI Central leverages three unique innovations: • Metadata Framework Extensible, descriptive end-to-end metadata enables field-level access control and data management. • DPF Framework Custom and standard Data Processing Functions (code libraries) can parse, recognize, extract, cleanse, standardize, encrypt, mask, or redact selected fields. • Policy Enforcement Engine All user requests for data are filtered through a rules engine, ensuring that data remains governance-compliant at all times. Curate Automatically index and catalog data for sub-second lookups Consume Make data easily accessible to all of your end users Privacy, Security, and Governance Automatically enforce data sharing, consent, and privacy policies Data Management Gain full control with enterprise-grade data management DATABASE SYSTEMS TEXT REPORTS and ANALYTICS COLLECT Ingest any raw data type and tag with metadata CURATE CONSUME Use powerful data processing functions to transform and catalog data into analytics-ready assets Generate datasets on demand Use any third-party apps BUSINESS INTELLIGENCE CUSTOM APPLICATIONS SPREADSHEET PRIVACY, SECURITY, GOVERNANCE Protect information at the field-level and ensure rightful access at scale THIRD-PARTY APPLICATIONS IMAGES DATA MANAGEMENT Manage data down to the field level SENSORS GENOMICS SYSTEM MANAGEMENT Enterprise-grade reliability, availability, and scalability with cluster economics APPLICATIONS AND USERS DATA SOURCES PHEMI CENTRAL BIG DATA WAREHOUSE 2 Collect Ingest and tag all types and any size of data PHEMI Central ingests data from multiple and disparate sources. Data can range from small kilobyte files to large terabyte files. Schemaless ingestion is fast. You can: • Stream data from machine-to-machine data sources through the PHEMI REST API • Push data directly from data sources and ETL tools using either JDBC or the PHEMI REST API • Deploy a custom connector based on the PHEMI REST API to allow PHEMI Central to fetch data from data sources • Upload data manually using a standard web browser window Data is tagged on ingest with descriptive metadata that immediately enforces privacy policies and data sharing agreements, and controls the data lifecycle. Curate Extract the greatest possible value from your data with processing, indexing, cataloging, linking, and metadata PHEMI Central uses a flexible, key-value store. Data is automatically indexed and cataloged as it is stored, making it immediately findable and retrievable. Sophisticated metadata tagging is used to describe, manage and govern the data that it stores. “ For the first time, organizations can take advantage of big data while retaining the governance and data management of a traditional enterprise data warehouse. Data Linking After cataloging and indexing, data can be linked based on keywords, graph relationships, and geospatial attributes. Data linking expands the kinds of connections you can make between data items, promotes discovery, and gives you a more complete picture of your data. Data Processing Function Framework PHEMI Central lets you develop custom computer programs, called Data Processing Functions (DPFs), that provide unprecedented power and flexibility. • Parse ingested data, extract or cleanse data, encrypt, mask, or anonymize selected information • Provide enhanced or deeper indexing and cataloging • Map data into standardized ontologies • Analyze streams of machine data to find patterns and exceptions, calculate aggregates, or convert streaming data into an analytics-ready state for trending and predictive analysis. As the organization’s needs evolve and knowledge advances, you can simply develop new DPFs and re-execute on your data. DPFs can be developed in modern programming languages such as Java, Python, and C++. No specialized expertise in big data technologies such as MapReduce or YARN is required. Your DPF can be developed by PHEMI, by your in-house programmers, or by a third party. Data Dictionary Conventional big data systems store big data, but struggle to catalog or track diverse data types. With PHEMI Central, you can use DPFs to build data dictionaries, identifying and saving a common interpretation for fields that occur frequently but are named differently or use different format conventions (such as “M/F” vs. “Male/Female”, or converting between Imperial and metric measurement schemes). Data dictionaries greatly simplify queries and analysis. 3 Consume Access your datasets on demand at sub-second speeds, even with petabytes of data Describing or tagging information with information with metadata means that users and applications can query data based on the data’s properties, instead of navigating complex directories or schemas. Multiple users can interact with the system, accessing datasets via SQL, data exports, and PHEMI API custom applications. Above all, information in PHEMI Central is findable and searchable, for users and applications. • Break down costly data silos by aggregating, then constructing datasets across multiple and disparate data sources • Reduce data sprawl by creating virtual datasets only on export • Improve consumption speeds with digital assets that are cataloged and indexed in advance • Ensure rightful access at all times, with every data request automatically mediated by the PHEMI policy enforcement engine Privacy, Security, and Governance Automatically de-identify, encrypt, or mask personal information PHEMI Central provides an industry-pioneering set of capabilities to manage the governance of sensitive data, enforced from end to end and throughout the lifecycle of data. PHEMI Central uses one coordinated framework based on Privacy by Design principles to define, manage, and enforce data sharing agreements and privacy policies across an entire organization or set of organizations. Data is tagged with attributes that describes its level of sensitivity. Users are tagged with attributes that describe their level of authorization. Simple, powerful access rules describe the relationships between data visibility and user authorization. Datasets can be associated with access policies that are independent of the policies attached to the source data collections, but rightful access to data is always enforced. PHEMI Central keeps your data secure: • User roles determine what operations a user can perform • The system maintains a complete, tamperproof audit log of operations and data access • Communication links from data sources or to consuming systems can be encrypted using Secure Sockets Layer (SSL) or Transport Layer Security (TLS) • Data fields can be individually selected for encryption at rest • Because privacy and security are performed at the data level, it’s easier and faster to prototype, test, and deploy new applications Privacy by Design A Privacy by Design (PbD) approach requires you to take into account seven foundational principles throughout your system. But how do you know whether your system implements PbD principles? Here’s a checklist: 1. Metadata. All data should be tagged on ingest with enough descriptive information to allow adequate privacy, sharing, consent, and lifecycle management, plus compliance with any other governance requirements. 2. Role-based access control. User and application access to functionality and operations is adequately restricted by system roles. 3. Policy-based data access. Access to and visibility of data is restricted by permissions and authorizations, and controlled by access policies. 4. Automatic policy enforcement. The system automatically enforces policies and governance; manual intervention is not required. Enforcement is not relegated to applications built on top of the repository. There’s a single point of management to ensure policy enforcement. 5. Transparency. Data stewards and privacy officers can directly view and verify the system implementation of governance policies. 6. Auditability. The system automatically tracks system activity, and maintains a detailed, tamperproof audit log of data access and system operations. 7. Data immutability. Data in the repository remains available in its original form, regardless of what digital assets are derived from the original through transformation. 8. Ability to anonymize. The system should be able to de-identify, encrypt, mask, obfuscate, or redact personal information, and allow the data steward or privacy officer to choose which version of data appears to which users. Privacy by Design is recognized as the global privacy standard in a landmark resolution by the International Conference of Data Protection and Privacy Commissioners. Visit privacybydesign.ca. 4 Specifications Data Management Use a powerful metadata framework to manage digital assets at the field level On-Premise Deployment* Cloud Deployment* 4 Cluster Nodes. Each: • Subscribe to PHEMI Central as a managed service running on Amazon Web Services. • 8xCore (2.2GHz) Field-level metadata contains the rules and policies governing the data at the field level. Data retention policies and data sharing agreements are automatically enforced. Data in the system is immutable: the original data cannot be modified and data is only purged from the system based on the configured retention policy. Robust version control and rollback capabilities mean that data is never lost, corrupted, or overwritten. • 64 GB RAM • 12 TB Direct Attached Storage • Cloud service grows from 1 TB storage capacity. 2 Management Nodes. Each: • 4xCore (2.2 GHz) • 64 GB RAM • 2 TB RAID1 Storage 1 Front-End Node: • 4xCore (2.2 GHz) System Management • 64 GB RAM • 2 TB RAID1 Storage Get cluster reliability and economics at scale 10 Gigabit Ethernet Network PHEMI Central can be deployed at the customer premise, as a managed service, or as a cloudbased service. The system uses low-cost commodity hardware components and Direct Attached Storage (DAS) disk drives to lower the cost of ownership compared to traditional enterprise data warehouse systems. Storage and compute resources scale linearly from terabytes to petabytes. Data Ingest Protocols Data Export Protocols • SFTP File Transfer • Excel/CSV/TSV Download • HTTP/HTTPS Manual Upload • REST Web Services API • REST Web Services API • ODBC/JDBC SQL Interface • ODBC/JDBC SQL Interface All data in the system is replicated three times to ensure availability and resiliency. DAS drives can be hot-swapped without impacting performance or data availability. Larger or faster DAS drives and nodes are absorbed into the system and load-balanced automatically. • CCDA HL7 Interface The system provides clear visibility into system health, diagnostics, troubleshooting, capacity, and digital assets under management. System management capabilities can also be integrated with existing tools. •R Analytics Tools Data Processing Functions Supports leading analytics tools, • Excel Reader • CSV Reader including: • Variant Call Format (VCF) Reader • SAP • SAS • SPSS • Qlikview • Stata • Tableau • Netezza • MySQL • JSON Reader • XML Reader • Custom DPFs *All our deployments align with appropriate privacy and security requirements, including Health Insurance Portability and Accountability Act (HIPAA) and Health Information Technology for Economic and Clinical Health (HITECH) Act as well as Canadian federal and provincial legislation. Ease Your Entry into Big Data PHEMI Central makes it easy to break into big data. The software is fully integrated and enterprise-ready, so you don’t need to hire a team of Hadoop engineers to build and maintain your system. And, you can start small and expand incrementally. Use PHEMI Central to offload your existing data warehouse, or to capture new data types or sources. Keep your existing systems and tools and let PHEMI Central feed data into them. You can move into big data as you become ready, at your own speed. Visit www.phemi.com for more information. www.phemi.com info@phemi.com twitter.com/PHEMIsystems linkedin.com/company/phemi Copyright © 2015, PHEMI and/or its affiliates. All rights reserved. Affiliate names may be trademarks of their respective owners. This document contains forward-looking features. May 2015