Not Only SQL
Transcription
Not Only SQL
Not Only SQL Table of Content Background and history Used Applications What is Cassandra? – Overview Replication & Consistency Writing, Reading, Querying and Sorting API’s & Installation World Database in Cassandra Using Hector API Administration tools Background Influential Technologies: Dynamo – Fully distributed design - infrastructure BigTable – Sparse data model Other NoSql databases NoSql Big Data NoSql MongoDB Hypertab Neo4J Cassandra HyperGra Riak Memcach Voldemort Tokyo Ca HBase Redis CouchDB Bigtable / Dynamo Bigtable Dynamo Hbase Hypertable Riak Voldemort Cassandra Combination of Both CAP Theorem Consistency Availability Partition Tolerance Applications Facebook Google Code Apache Digg Twitter Rackspace Others… What Is Cassandra? O(1) node lookup Key – Value Store Column based data store Highly Distributed – decentralized (no master\slave) Elasticity Durable, Fault-tolerant - Replications Sparse ACID NoSQL! Overview – Data Model Keyspace Uppermost namespace Typically one per application Column Basic unit of storage – Name, Value and timestamp ColumnFamily Associates records of a similar kind Record-level Atomicity Indexed SuperColumn Columns whose values are columns Array of columns SuperColumnFamily ColumnFamily whose values are only SuperColumns Examples Column - City: ORANJESTAD {"id": 1, "name": "ORANJESTAD", "population": 33000, "capital": true} SuperColumns – Country: Aruba {"id": "aa", "name": "Aruba", "fullName": "Aruba“, "location": "Caribbean, island in the Caribbean Sea, north of Venezuela", "coordinates": { "latitudeType": "N", "latitude": 12.5, "longitudeType": "W", "longitude": 69.96667}, …. Replication & Consistency Consistency Level is based on Replication Factor (N), nor the number of nodes in the system. The are a few options to set How many replicas must respond to declare success Query all replicas on every read Every Column has a value and a timestamp – latest timestamp wins Read repair – read one replica and check the checksum/timestamp to verify R(number of nodes to read from) + W(number of nodes to write on) > N (number of nodes) The Ring - Partitioning Each NODE has a single, unique TOKEN Each NODE claims a RANGE of its neighbors in the ring Partitioning – Map from Key Space to Token – Can be random or Order Preserving Snitching – Map from Nodes to Physical Location Writing No Locks Append support without read ahead Atomicity guarantee for a key (in a ColumnFamily) Always Writable!!! SSTables – Key/data – SSTable file for each column family Fast Reading Wait for R responses Wait for N – R responses in the background and perform read repair Read multiple SSTables Slower than writes (but still fast) Compare with MySQL (RDBMS) Compare a 50GB Database: MySQL ~300ms write ~350ms read Cassandra ~0.12ms write ~15ms read Queries Single column Slice Set of names / range of names Simple slice -> columns Super slice -> supercolumns Key range Sorting Sorting is set on writing Sorting is set by the type of the Column/Supercolumn keys Sorting/keys Types Bytes UTF8 Ascii LexicalUUID TimeUUID Drawbacks No joins (for speed) Not able to sort at query time Not really supports sql (altough some API’s support it on a very small portion) API’s Many API’s for large number of languages includes C++, Java, Python, PHP, Ruby, Erlang, Haskell, C#, Javascript and more… Thrift interface – Driver level interface – hard to use. Hector – a java Cassandra client – simple Column based client – does what Cassandra is intended to do. Kundera – JPA supported java client – tries to translate JPA classes and attributes to Cassandra – good on inserts, hard and problematic still with queries. Cassandra Installation Install prerequisite – basically the latest java se release Extract the Cassandra Zip files to your requested path Run Bin/cassandra.but –f Cassandra node is up and running World database in cassandra World - Keyspace Countries – SuperColumn Family CountryDetails – SuperColumn Border – SuperColumns Coordinates – SuperColumn GDP – SuperColumn Language – SuperColumns Cities – Column Family Using Hector API - definitions Creating a Cassandra Cluster : Cluster cluster = HFactory.getOrCreateCluster("WorldCluster", "localhost:9160"); Adding a keyspace: columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE); Adding a Column: BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE); columnFamilyDefinition.setName(CITY_CF); // ColumnFamily Name columnFamilyDefinition.addColumnDefinition(columnDefinition); Using Hector API - definitions Adding a SuperColumn: BasicColumnFamilyDefinition superCfDefinition = new BasicColumnFamilyDefinition(); superCfDefinition.setKeyspaceName(WORLD_KEYSPACE); superCfDefinition.setName(COUNTRY_SUPER); superCfDefinition.setColumnType(ColumnType.SUPER); Adding all definition to cluster: ColumnFamilyDefinition cfDefStandard = new ThriftCfDef(columnFamilyDefinition); ColumnFamilyDefinition cfDefSuper = new ThriftCfDef(superCfDefinition); KeyspaceDefinition keyspaceDefinition = HFactory.createKeyspaceDefinition(WORLD_KEYSPACE, "org.apache.cassandra.locator.SimpleStrategy", 1, Arrays.asList(cfDefStandard, cfDefSuper)); cluster.addKeyspace(keyspaceDefinition); Using Hector API - inserting Creating a Column Template ColumnFamilyTemplate<String, String> template = new ThriftColumnFamilyTemplate<String, String>(keyspaceOperator, columnFamilyName, stringSerializer, stringSerializer); Adding a Row into a Column Family ColumnFamilyUpdater<String, String> updater = template.createUpdater("a key"); updater.setString(“key", "value"); try { template.update(updater); } catch (HectorException e) { // do something ... } Using Hector API - inserting Creating a Super Column Template SuperCfTemplate<String,String, String> template = new ThriftSuperCfTemplate<String, String, String>(keyspaceOperator, columnFamilyName, stringSerializer, stringSerializer, stringSerializer); Adding a Row into a SuperColumn Family SuperCfUpdater<String, String, String> updater = template.createUpdater("a key"); HSuperColumn<String, String, ByteBuffer> superColumn = updater.addSuperColumn(“sc name”); superColumn.setString(“column name”, value); superColumn.update(); try { template.update(updater); } catch (HectorException e) { // do something ... } Using Hector API - reading Reading all Rows and it’s columns from a Column Family (Using CQL) CqlQuery<String,String,String> cqlQuery = new CqlQuery<String,String,String>(factory.getKeyspaceOperator(), stringSerializer, stringSerializer, stringSerializer); cqlQuery.setQuery("select * from City"); QueryResult<CqlRows<String,String,String>> result = cqlQuery.execute(); Reading all columns from a Row in a SuperColumn Family SuperCfTemplate<String,String,String> superColumn = HectorFactory.getFactory().getSuperColumnFamilyTemplate(“SuperColumnFamily”); SuperCfResult<String, String, String> superRes = superColumn.querySuperColumns(“key"); Collection<String> columnNames = superRes.getSuperColumns(); Using Hector API - reading Reading a SuperColumn from a Row in a SuperColumn Family SuperColumnQuery<String, String, String, String> query = HFactory.createSuperColumnQuery(keyspaceOperator, stringSerializer, stringSerializer, stringSerializer, stringSerializer); query.setColumnFamily(“SuperColumnFamily”); query.setKey(“key"); query.setSuperName(“SuperColumnName"); QueryResult<HSuperColumn<String, String, String>> result = query.execute(); for (HColumn<String, String> col : result.get().getColumns()) { String name = col.getName(); String value = col.getValue(); } Every query as options to get part of the rows – by setting start value and end value (the rows are sorted on inserting), and part of the columns by setting the column names explicitly Administration tools Cassandra – node activator Nodetool – bootstrapping and monitoring Cassandra-cli – Application Console Sstable2json - Export Json2sstable - Import