NoSQL? No Worries: DynamoDB and ElastiCache

Transcription

NoSQL? No Worries: DynamoDB and ElastiCache
NoSQL? No Worries: DynamoDB
and ElastiCache
Dan Zamansky, Sr. Product Manager, AWS
Siva Raghupathy, Principal Solutions Architect, AWS
Agenda
•
•
•
•
•
NoSQL
Why managed database service?
DynamoDB
ElastiCache
Takeaways
NoSQL
NoSQL
Benefits
• Schema less
• Highly Scalable
– Size
– Throughput
• Highly Available
Constraints
• No cross table/item
transactions
• No complex queries or
joins
NoSQL available on AWS
Managed
• Amazon DynamoDB
• Amazon ElastiCache
– Memcached
– Redis
Unmanaged
• Apache Cassandra
• MongoDB
• CouchDB
• Riak
• ….
Why managed database services?
If you host your databases on-premises
App optimization
Scaling
High availability
Database backups
DB s/w patches
DB s/w installs
OS patches
OS installation
Server maintenance
Rack & stack
Power, HVAC, net
you
If you host your databases on-premises
App optimization
Scaling
High availability
Database backups
DB s/w patches
DB s/w installs
OS patches
OS installation
Server maintenance
Rack & stack
Power, HVAC, net
you
If you host your databases on Amazon EC2
App optimization
Scaling
High availability
Database backups
DB s/w patches
DB s/w installs
OS patches
you
OS installation
Server maintenance
Rack & stack
Power, HVAC, net
If you host your databases on Amazon EC2
App optimization
Scaling
High availability
Database backups
DB s/w patches
DB s/w installs
OS patches
you
OS installation
Server maintenance
Rack & stack
Power, HVAC, net
If you choose a managed DB service
Scaling
High availability
Database backups
DB s/w patches
DB s/w installs
OS patches
OS installation
Server maintenance
Rack & stack
App optimization
you
Power, HVAC, net
Who uses AWS Managed Database Services?
Amazon DynamoDB
Amazon DynamoDB
•
•
•
•
•
Managed NoSQL database service
Accessible via Simple and Powerful APIs
Supports both document and key-value data models
Highly scalable
Consistent, single-digit millisecond latency at any
scale
• Highly durable & available - 3x replication
• No table size or throughout limits
Table
Table
Items
Attributes
Mandatory
Key-value access pattern
Determines data distribution
Hash Range
Key Key
Optional
Model 1:N relationships
Enables rich query capabilities
All items for a hash key
==, <, >, >=, <=
“begins with”
“between”
sorted results
counts
top/bottom N values
paged responses
Provisioned Throughput Model
• Throughput provisioned at the table level
– Write capacity units (WCU) are measured in 1 KB per second
– Read capacity units (RCU) are measured in 4 KB per second
• RCUs measure strictly consistent reads
• Eventually consistent reads cost 1/2 of consistent reads
• WCU and RCU are independent
RCU
• Consumed capacity is measured per operation
WCU
Scaling
Partition 1
Partition 2
• Scaling is achieved through partitioning
• Tables are partitioned for
– Throughput
• Provision any amount of throughput to a table
– Size
• Add any number of items to a table
Partition 3
Partition 4
Partition N
Table
Indexing
• Local Secondary Index
– Local to a hash key
– Alternate range key
• Global Secondary Index
– Across all hash keys
– Alternate hash (+range) key
User-files-table
User File
Date Shared
(hash) (range)
Size
File-size-LSI
User
Size
File
Date
(hash) (range) (table key) (projected)
Shared-files-GSI
Shared
User
File
Date
(hash) (table key) (table key) (projected)
Data types
• String (S)
• Number (N)
• Binary (B)
• String Set (SS)
• Number Set (NS)
• Binary Set (BS)
•
•
•
•
Boolean (BOOL)
Null (NULL)
List (L)
Map (M)
Used for storing nested JSON documents
DynamoDB Table and Item API
• CreateTable
• UpdateTable
• DeleteTable
• DescribeTable
• ListTables
•
•
•
•
GetItem
Query
Scan
BatchGetItem
•
•
•
•
PutItem
UpdateItem
DeleteItem
BatchWriteItem
DynamoDB Streams API
•
•
•
•
ListStreams
DescribeStream
GetShardIterator
GetRecords
DynamoDB Streams
• Stream of updates to
a table
• Asynchronous
• Exactly once
• Strictly ordered
– Per item
•
•
•
•
Highly durable
Scale with table
24-hour lifetime
Sub-second latency
DynamoDB Streams and AWS Lambda
Cross-region replication
US East (N. Virginia)
DynamoDB Streams
Asia Pacific (Sydney)
Open Source CrossRegion Replication Library
EU (Ireland) Replica
Data & Access Modeling
Store data based on how you will access it!
1:1 relationships or key-values
• Use a table or GSI with a hash key
• Use GetItem or BatchGetItem API
Example: Given a user or email, get attributes
Users Table
Hash key
UserId = bob
UserId = fred
Attributes
Email = bob@gmail.com, JoinDate = 2011-11-15
Email = fred@yahoo.com, JoinDate = 2011-12-01
Users-Email-GSI
Hash key
Email = bob@gmail.com
Email = fred@yahoo.com
Attributes
UserId = bob, JoinDate = 2011-11-15
UserId = fred, JoinDate = 2011-12-01
1:N relationships or parent-children
• Use a table or GSI with hash and range key
• Use Query API
Example:
– One device has many readings
– For DeviceId = 1, find all readings where epoch >= 1435457946
Hash Key
DeviceId = 1
DeviceId = 1
DeviceId = 2
Device-measurements
Range key
Attributes
epoch = 1435457946 Temperature = 30, pressure = 90
epoch = 1435457960 Temperature = 32, pressure = 91
epoch = 1435458028 Temperature = 32, pressure = 91
N:M relationships
• Use a table and GSI with hash and range key
elements switched
• Use Query API
Example: Given a user, find all games. Or given a
game, find all users.
User-Games-Table
Hash Key
Range key
UserId = bob GameId = Game1
UserId = fred GameId = Game2
UserId = bob GameId = Game3
Game-Users-GSI
Hash Key
Range key
GameId = Game1 UserId = bob
GameId = Game2 UserId = fred
GameId = Game3 UserId = bob
Documents (JSON)
•
•
New data types (M, L, BOOL,
NULL) introduced to support
JSON
Document SDKs
– Simple programming model
– Conversion to/from JSON
– Java, JavaScript, Ruby, .NET
• Cannot index (S,N) elements
of a JSON object stored in M
– They need to be modeled as
top-level table attributes to be
used in LSIs or GSIs
Javascript
DynamoDB
string
S
number
N
boolean
BOOL
null
NULL
array
L
object
M
DynamoDB use cases - IoT
case class CameraRecord(
cameraId: Int, // hash key
ownerId: Int,
subscribers: Set[Int],
hoursOfRecording: Int,
...
)
case class Cuepoint(
cameraId: Int,
// hash key
timestamp: Long, // range key
type: String,
...
)
Video:
https://youtu.be/-0FtKBgYiik?t=79
DynamoDB use cases - AdTech
Requirements:
– Low <5ms response time
– 1,000,000+ global requests/second
– 100B items
DynamoDB table
HashKey
RangeKey
Value
Key
Segment
1234554343254
Key
Segment1
1231231433235
Video:
https://youtu.be/qV7yAwcMtYE?t=598
DynamoDB use cases - Retail
Video:
https://youtu.be/AHk3RhrETi4?t=1616
Amazon DynamoDB Best Practices
• Keep item size small
– Compress large items
– Store metadata in Amazon DynamoDB and large
blobs in Amazon S3
• Use table per day, week, month etc. for
storing time series data
• Use conditional updates for de-duping &
versioning
• Avoid hot keys and hot partitions
Events_table_2012
Event_id
(Hash key)
Timestam
p
(range key)
Attribute1
….
Attribute N
Events_table_2012_05_week1
Events_table_2012_05_week2
Attribute1
…. Attribute N
Event_id
Timestam
(Hash key)
p
Attribute1
…. Attribute N
Event_id
Timestam
(range key)
(Hash
key)
p
Events_table_2012_05_week3
(range key)
Attribute1
…. Attribute N
Event_id
Timestam
(Hash key)
p
(range key)
Amazon ElastiCache
Why In-Memory?
ms
μs
Why In-Memory?
• Everything is connected - Phones, Tablets, Cars, Air
Conditioners, Toasters
• Demand for real-time performance – online games,
AdTech, eCommerce, social apps etc.
• Load is spikey and unpredictable
• DB performance often the bottleneck
Amazon ElastiCache
• AWS Managed service that lets
you easily create, use and scale
in-memory key-value stores in
the cloud
and it comes in two flavors:
Memcached
Memcached
Insanely fast!
Patterns for sharding
No persistence
Very established
In-memory key-value datastore
Slab allocator
Supports strings, objects
Multi-threaded
Redis
In-memory key-value datastore
Ridiculously fast!
More like a NoSQL db
Pub/sub functionality
http://redis.io/commands
Persistence
Supports data types
snapshots or append-only log
strings, lists, hashes, sets, sorted
sets, bitmaps & HyperLogLogs
Read replicas
Single-threaded
Atomic operations
supports transactions
has ACID properties
Memcached or Redis?
Memcached
Redis
Simple caching to offload
DB burden
Ability to scale horizontally
Multithreaded performance
Advanced data types
Sorting/Ranking data sets
Pub/Sub capability
HA through replication
Persistence
Yes with Redis 3.0
How can I leverage In-Memory? Key Use Cases
Caching
App
Reads
ElastiCache
Cache
Updates
Database Reads
Clients
Elastic Load
Balancing
EC2 App
Instances
Amazon RDS
Database Writes
DynamoDB
Be Lazy
# Python pseudocode
def get_user(user_id):
# Check the cache
record = cache.get(user_id)
if record is None:
# Run a DB query
record = db.query("select * from users where id = ?", user_id)
cache.set(user_id, record)
return record
# App code
user = get_user(17)
Write-back Caching
# Python
def save_user(user_id, values):
# Save to DB
record = db.query("update users ... where id = ?",
user_id, values)
# Push into cache
cache.set(user_id, record)
return record
# App code
user = save_user(17, {"name": "Nate Dogg"})
Leaderboards - Redis
•
•
East to implement using Sorted Sets
Simultaneously guarantees:
–
uniqueness and ordering
def save_score(user, score):
redis.zadd("leaderboard", score, user)
def get_rank(user)
return redis.zrevrank(user) + 1
It’s
mine!
Not if I
destroy
it first!
Example
ZADD
ZADD
ZADD
ZADD
"leaderboard"
"leaderboard"
"leaderboard"
"leaderboard"
1201
963
1092
1383
"Gollum”
"Sauron"
"Bilbo"
"Frodo”
ZREVRANGE "leaderboard" 0 -1
1) "Frodo"
2) "Gollum"
3) "Bilbo"
4) "Sauron”
ZREVRANK "leaderboard" "Sauron"
(integer) 3
Customer Example – Globo App
https://www.youtube.com/watch?v=F34SszLGH6A
Recommendation Engines
Use Redis to store data for recommendation algorithms such as Slope One.
INCR
HSET
INCR
HSET
-
"item:38923:likes"
"item:38923:ratings" "Susan" 1
"item:38923:dislikes"
"item:38923:ratings" "Tommy" -1
Redis counters used to increment/decrement number of likes or dislikes
to an item.
Redis hashes to maintain a list of everyone who likes or disliked an
item.
Task queue (Redis backed)
Ruby based
Resque
http://github.com/resque
•
Basically, anything can be done
asynchronously outside of the
immediate user-experience:
–
–
–
–
–
–
–
sending email
image or video processing
converting or watermarking docs
generating (large) reports
priming or cleaning caches
interacting with external APIs
search indexing
Python based
Redis-Queue
http://python-rq.org
Chat and Messaging - Redis
PUBLISH and SUBSCRIBE Redis commands
SUBSCRIBE "chat:114"
PUBLISH "chat:114" "Hello all"
["message", "chat:114", "Hello all"]
UNSUBSCRIBE "chat:114"
Redis HA on ElastiCache
Auto-Failover


Goes to replica with
lowest replication lag
No changes in DNS
asynchronous replication
writes
use “Primary
Endpoint”
from Node Group
reads
use ‘replica’ endpoints
from Node Group
*can use ‘primary’ also
Availability Zone #1
Availability Zone #2
Takeaways
• Define you access patterns and needs
• Use AWS managed services to offload the
undifferentiated heavy-lifting of database management
• Pick the right NoSQL tool:
– ElastiCache-Memcached for key-value caching
– ElastiCache-Redis for key-value caching and in-memory data structures
– DynamoDB for storing & indexing key-values and documents