Dirk deRoos Presentation - Dama-NY

Transcription

Dirk deRoos Presentation - Dama-NY
Big Data and the Cloud
Dirk deRoos
dderoos@ca.ibm.com
@Dirk_deRoos
IBM World-Wide Technical Sales Leader, Big Data
© 2013 IBM Corporation
2
© 2013 IBM Corporation
The Economics of Growth Have Changed
• Land
• Labor
• Capital
• Cloud
• Analytics
• Data
Need to Agree on Definitions: Cloud
 On-demand
– Users can sign-up for the service and use it immediately
 Self-service
– Users can use the service at any time
 Scalable
– Users can scale-up the service at any time, without waiting for the provider to add more
capacity
 Measurable
– Users can access measurable data to determine the status of the service
 Coined by Dave Nielsen, CloudCamp Founder
Source: Dave Nielsen, CloudCamp
Cloud Computing Service Models
 Software as a Service (SaaS)
– Computing capacity
– Middleware
– Applications
 Platform as a Service (PaaS)
– Middleware
– Raw computing capacity
SaaS
 Infrastructure as a Service (IaaS)
– Raw computing capacity
PaaS
IaaS
Source: NIST Definition of Cloud Computing v15
“Consumerization” of IT
 IT departments not seen as source
of innovation
 Home and web-based experiences
driving IT expectations in enterprise
– Self service provisioning
– Time-to-value measured in minutes
 Enterprise LOB consuming Services
by-passing IT dept
– IT departments respond by adopting
newer technologies, evolving traditional
capabilities
Deployment Models
Private
Hybrid
Public
IT capabilities are provided
“as a service,” over an
intranet, within the
enterprise and behind the
firewall
Internal and
external service
delivery methods
are integrated
IT activities /
functions are
provided “as a
service,” over
the Internet
Private Cloud
Managed
Private Cloud
Hosted Private
Cloud
Member Cloud
Services
Third-party operated
On-Premise
(Enterprise data center)
Public Cloud
Services
Movement from Traditional Environments to Cloud
Many clients are already on the
way to cloud with consolidation
and virtualization efforts
CLOUD
Dynamic provisioning for workloads
SHARED RESOURCES
Common workload profiles
AUTOMATE
Flexible delivery & Self Service
STANDARDIZE
Operational Efficiency
VIRTUALIZE
Increase Utilization
CONSOLIDATE
Physical Infrastructure
Traditional IT
Leon Katsnelson (leon@ca.ibm.com)
Some Workloads Better than Others for Cloud
Higher Gain from External Clouds
Idealized
Workloads
Collaboration
Discovery
Application
Development
On-Line
Storage
SMB
ERP
Web Scale
Analytics
[Enterprise Data]
Higher Pain to Cloud Delivery
DB MigrationSituational
Apps
Projects
Transactional
Content
Large
Enterprise
ERP
Lower Gain from External Clouds
Dep’t. BI
Application
Test
Web2.0
Data
Archive
Lower Pain to Cloud
Delivery
“Loosely Coupled” Architecture
“Content-Centric” Architecture
“DB-Centric” Architecture
Storage and Data Integration Arch.
Dev/Test Environments: Challenges/Observations
 30% to 50% of all servers within a typical IT environment are
dedicated to test
 Most test servers run at less than 10% utilization, if at all!
 IT staff report a top challenge is finding available resources to
perform tests in order to move new applications
into production
 30% of all defects are caused by badly configured test
 Testing backlog is often very long and single largest factor in the
delay new application deployments
 Test environments are seen as expensive and providing little real
business value
* “Industry Developments and Models – Global Testing Services: Coming of Age,” IDC, 2008 and IBM Internal Reports
Development/Test Environment - Perfect for Cloud
 Quick ROI
– 30% to 50% of all servers within a typical IT environment are dedicated to test
– Most test servers run at less than 10% utilization, if they are running
at all!
 Low risk
– Low risk in terms of business and overall IT operations
– Security/compliance concerns easily mitigated
 Excellent return on automation
– Agility
– Consistent dev/test environments mean fewer errors
– Self-service
Need to Agree on Definitions: Big Data
 Information management challenges that can’t be dealt with using
traditional tools and approaches
Cost efficiently
processing the
growing Volume
50x
2010
30 Billion
35 ZB
RFID
sensors and
counting
Collectively analyzing
the broadening Variety
80%
of the
worlds data is
unstructured
2020
Viability
Viscosity
Responding to the
increasing Velocity
Veracity
Variability
Value
Valence
The Big Data Conundrum
 The percentage of available data an enterprise can analyze is
decreasing proportionately to the available to that enterprise
 Quite simply, this means as enterprises, we are getting “more naive”
over time
Data AVAILABLE to
an organization
Data an organization
can PROCESS
Traditional Enterprise Data and Analytics
Data
Sources
Put Staging Area in the EDW
Actionable
Insights
+ In-database transformations (ELT faster than ETL)
+ Provides some structure, enabling queries
- Adds significant cost and overhead to EDW
Predictive Analytics
& Modeling
Staging
Area
Expanded
EDW
Marts
BI & Performance
Management
Structured
Operational
Archive
Information Movement & Transformation
Traditional Data Mining and Exploratory Analysis
15
© 2013 IBM Corporation
6
Warehouse Modernization Has Two Themes
Traditional Analytics
Big Data Analytics
Structured & Repeatable
Structure built to store data
Iterative & Exploratory
Data is the structure
Business
Users
Determine
Questions
IT Team
Delivers Data
On Flexible
Platform
Analyzed
Information
Available Information
Capacity constrained down sampling
of available information
Analyzed
Information
IT Team
Builds System
To Answer
Known Questions
Carefully cleanse a small information
before any analysis
Analyze ALL Available Information
Whole population analytics
connects the dots
Analyzed
Information
Business
Users
Explore and
Ask Any Question
Analyze information as is & cleanse as
needed & existing repeatable
7
Warehouse Modernization Has Two Themes
Traditional Analytics
Big Data Analytics
Structured & Repeatable
Structure built to store data
Iterative & Exploratory
Data is the structure
Hypothesis
Question
Data
?
All Information
Exploration
Analyzed
Information
Answer
Data
Start with hypothesis
Test against selected data
Analyze after landing…
Actionable Insight
Correlation
Data leads the way
Explore all data, identify correlations
Analyze in motion…
Next Generation Information Management Architecture
Data
Sources
Big Data Platform
Actionable
Insights
Real-Time Analytics
Predictive Analytics
& Modeling
Streaming
Sensor
Geospatial
Time Series
Structured
Operational
Landing,
Exploration
& Archive
Analytic
Appliances
Enterprise
Warehouse
BI & Performance
Management
Data Marts
Exploration &
Discovery
Unstructured
Information Movement, Matching & Transformation
External
Social
Security, Governance and Business Continuity
Hadoop and the Cloud: Considerations
 Hadoop was designed for bare metal
– Hadoop runs best with locally attached storage and
dedicated networking
– Rack awareness breaks in many cloud deployments
– Hadoop will still run in virtualized environments, but data processing
will not perform as well as on bare metal
• Large amount of network traffic
 Hadoop has sweet spots
– Large scale batch analysis
– Data flexibility
 Data governance requirements
–
–
–
–
–
–
19
Privacy
Security
Regulatory requirements
Metadata management
Data access interfaces
…
© 2013 IBM Corporation
Conclusions
 Cloud infrastructure has many benefits for Big Data analytics
– Inexpensive storage
– Inexpensive processing (short term)
– Flexible (scale in/out) architecture
 Ideal workloads: Ad-hoc analysis
– Performance is of secondary concern
– Ability to flexibly pull in many different data sets
 Longer term applications are more costly on public clouds
– Private clouds are an interesting option for internal Hadoop deployments
– Ideal for short-term ad-hoc projects
• Flexible, inexpensive
 Consider governance issues!!!
– Private clouds may be necessary
– Governance tools are available for Hadoop and the cloud
• Hint, hint… IBM
20
© 2013 IBM Corporation
THINK
21
© 2013 IBM Corporation