Sistemas de Datos - Algebra relacional
Transcription
Sistemas de Datos - Algebra relacional
Sistemas de Datos XML XML Sistemas de Datos Esquema de la clase 1. Que es XML 2. Para que se utiliza 3. XML en las Bases de Datos 4. Implementación en DBMS actuales XML Sistemas de Datos Que es XML • EXtensible Markup Language (Lenguaje de Etiquetado Extensible). • Es muy similar a HTML pero su función principal es describir y trasmitir datos y no mostrarlos como es el caso de HTML. • Los tags (marcas) de XML no están predefinidos. Es el usuario quien las define. Los tags son metadatos en los documentos. • Es un estándar no licenciado, independiente de plataformas, y soportado por toda la industria de software. XML Sistemas de Datos Que es XML Ejemplo de un archivo XML <menu_almuerzo> Tag de apertura Raiz del XML <comida> Elemento Padre <nombre>Waffles</nombre> Valor <precio>$2.00</precio> Atributo <descripcion>Waffles baratos de McDonalds</descripcion> <calorias>650</calorias> Elemento Hijo <ingrediente> <descripcion>Harina</descripcion> </ingrediente> </comida> <comida> <nombre>Hamburguesa</nombre> <precio>$5.00</precio> <descripcion>La hamburguesa mas comun de McDonalds</descripcion> <calorias>1500</calorias> </comida> </menu_almuerzo> Tag de cierre XML Sistemas de Datos Para que se usa XML Datos vs. Documentos Comunicación entre aplicaciones. XML es el estándar de intercambio entre sistemas heterogéneos). XML basado en datos Empresa #1 Empresa #2 Intercambio de datos con actores externos a la organización Web Services Almacenar y recuperar documentos (información semi-estructurada). Administración de contenido y metadata. XML basado en información XML Sistemas de Datos Para que se usa XML XML basado en datos XML como medio de transporte de Datos. Estructura regular, conjuntos de atributo-valor, poco o ningún contenido adicional. El orden en el que se presenta el contenido no es relevante. Las Bases de datos pueden ser origen y/o destino de este tipo de documentos. Ejemplos: Ordenes de venta, datos sobre stock, itinerarios y horarios de vuelos. Resultados de las consultas no ranqueados, solo importa que los resultados cumplan con las condiciones de la consulta. Ejemplo de Consulta a XML basado en datos: Buscar los empleados cuyo salario sea el mismo este mes que el de hace 12 meses atrás. XML Sistemas de Datos Comparison with Relational Data Desventajas: Ineficiente: tags, que representan información sobre el esquema, están repetidos. Beneficios: Al contrario que las tuplas relacionales, los datos en XML se autodocumentan con el uso de tags. Formato no rígido: se pueden incorporar nuevos tags. Permite estructuras anidadas. Gran aceptación, no sólo por sistemas de bases de datos, sino también por navegadores, lenguajes de programación y aplicaciones. XML Sistemas de Datos Motivation for Nesting Nesting of data is useful in data transfer Nesting is not supported, or discouraged, in relational databases Example: elements representing item nested within an itemlist element With multiple orders, customer name and address are stored redundantly normalization replaces nested structures in each order by foreign key into table storing customer name and address information Nesting is supported in object-relational databases But nesting is appropriate when transferring data External application does not have direct access to data referenced by a foreign key XML Sistemas de Datos Para que se usa XML XML basado en datos <?xml version=" 1.0 " encoding=" UTF-8 " standalone= " yes "?> <ficha> <nombre> Angel </nombre> <apellido> Barbero </apellido> <direccion> Portela 36 1° A</direccion> </ficha> Ejemplo archivo XML: Catalogo CDs XML Sistemas de Datos Para que se usa XML XML basado en información XML como medio de estructurar, almacenar y recuperar documentos/información. Estándar de facto para almacenar documentos por su capacidad de almacenar y utilizar su estructura (párrafos, secciones, notas de pie, etc.) y metadatos (autor, año de publicación, etc.). XML diseñado para ser consumido por personas. Estructura no regular, baja granularidad en la información (la más pequeña unidad de información tiene contenido mixto o es el documento entero), mucho contenido mixto. El orden en el que se presenta el contenido es relevante. Usualmente son escritos a mano en XML o en otro formato que luego es convertido a XML. Ejemplos: libros, leyes, email, y cualquier otro documento escrito a mano. XML Sistemas de Datos Para que se usa XML XML basado en información Ejemplo archivo XML: RetrieveProductSearchResultContent.XML XML Sistemas de Datos Uso de XML en las Bases de Datos Almacenamiento: Posibilidad #1 Mapear el XML en columnas de tipos de datos comunes (caracter, numerico, fecha, etc.) de una o más tablas. Solo para XML basado en datos. Requiere más proceso. Puede realizarse de varias maneras (XML/SQL, etc.) Ejemplo en T-SQL INSERT INTO some_table (column1, column2, column3) SELECT Rows.n.value('(@column1)[1]', 'varchar(20)'), Rows.n.value('(@column2)[1]', 'nvarchar(100)'), Rows.n.value('(@column3)[1]', 'int'), FROM @xml.nodes('//Rows') Rows(n) XML Sistemas de Datos Uso de XML en las Bases de Datos Almacenamiento: Posibilidad #2 Almacenar el XML en campos del tipo Binary Large Object (BLOB) o Character Large Object (CLOB). Solución sencilla. Funciona con todos los motores de Bases de Datos. Dificultad para realizar consultas sobre el contenido de los datos de manera sencilla. Se pueden usar consultas del tipo full-text search, pero se pierde el uso de los tags (no permite diferenciar datos de metadatos). XML Sistemas de Datos Uso de XML en las Bases de Datos Almacenamiento: Posibilidad #3 Almacenar el XML en un campo especializado para guardar y/o indexar XML. No todas las bases de datos tenen una forma nativa para guardar XML. Las técnicas utilizadas para almacenar y/o indexar XML pueden variar de un motor a otro. Se crea una dependencia con el motor de base de datos utilizado. XML Sistemas de Datos Uso de XML en las Bases de Datos Hacer Consultas: Posibilidad #1 Simple language designed for translation from XML to XML and XML to HTML Usar JDBC o ODBC en conjunto con SAX o DOM (y tal vez XSLT) para transformar los resultados de consultas SQL a XML. Por ejemplo, el programa podría consultar los clientes, y luego hacer consultas adicionales para consultar los proyectos asociados a cada uno de esos clientes. Este procedimiento puede resultar ineficiente por el número de consultas requerido. Base de Datos Consultas SQL Aplicación XML XML Sistemas de Datos Uso de XML en las Bases de Datos Hacer Consultas: Posibilidad #2 Usar las extensiones XML provistas por el motor de base de datos utilizado. Estas extensiones pueden resultar más o menos sencillas y mantenibles dependiendo del motor elegido, pero todas hacen más simple la tarea. Se crea una dependencia con el motor de base de datos utilizado. XML Sistemas de Datos Uso de XML en las Bases de Datos Hacer Consultas: Posibilidad #3 Usar SQL/XML (ANSI SQL 2003). Un pequeño set de funciones han sido agregadas al estándar SQL para publicar XML. Para el programador SQL requiere poco aprendizaje. SQL/XML está soportado por Oracle e IBM, pero no por Microsoft (SQL/XML es diferente de SQLXML, una tecnología propietaria de Microsoft, y el parecido en los nombres ha causado gran confusión en el sector). SQL/XML puede ser usado con APIs de base de datos tradicionales como JDBC. Incluye la definición de un tipo de datos XML nativo, formas implícitas y explícitas de generar XML desde datos relacionales, y una manera implícita para mapear datos relacionales a XML. XML Sistemas de Datos Uso de XML en las Bases de Datos Hacer Consultas: Posibilidad #4 Usar XQuery. XQuery es un lenguaje de consultas XML nativo. Como es un lenguaje nuevo, tiene una mayor curva de aprendizaje para los programadores SQL, pero resulta más natural para los programadores XML. A diferencia de XML/SQL, XQuery se encuentra optimizado para procesar XML, y es particularmente bueno para aplicaciones que deben procesar XML junto a datos relacionales. Los mayores motores de Bases de datos soportan XQuery. XML Sistemas de Datos Querying and Transforming XML Data Translation of information from one XML schema to another Querying on XML data Above two are closely related, and handled by the same tools Standard XML querying/translation languages XPath XSLT Simple language consisting of path expressions Simple language designed for translation from XML to XML and XML to HTML XQuery An XML query language with a rich set of features XML Sistemas de Datos Validación en XML La validación de un documento en como un contrato, el creador verifica que el documento ha sido creado apropiadamente, y el consumidor verifica que posee el formato esperado. Posibilidades para validar un documento: Usar DTD (Document Type Definition) Usar XSD (XML Schema Definition XML Sistemas de Datos DTD DTD - Document Type Definition. Set de markup declarations. Define un document type para la familia de lenguajes de markup de SGDML Widely used XML Sistemas de Datos Document Type Definition (DTD) The type of an XML document can be specified using a DTD DTD constraints structure of XML data DTD does not constrain data types What elements can occur What attributes can/must an element have What subelements can/must occur inside each element, and how many times. All values represented as strings in XML DTD syntax <!ELEMENT element (subelements-specification) > <!ATTLIST element (attributes) > XML Sistemas de Datos Element Specification in DTD Subelements can be specified as names of elements, or #PCDATA (parsed character data), i.e., character strings EMPTY (no subelements) or ANY (anything can be a subelement) Example <! ELEMENT department (dept_name building, budget)> <! ELEMENT dept_name (#PCDATA)> <! ELEMENT budget (#PCDATA)> Subelement specification may have regular expressions <!ELEMENT university ( ( department | course | instructor | teaches )+)> Notation: “|” - alternatives “+” - 1 or more occurrences “*” - 0 or more occurrences XML Sistemas de Datos University DTD <!DOCTYPE university [ <!ELEMENT university ( (department|course|instructor|teaches)+)> <!ELEMENT department ( dept name, building, budget)> <!ELEMENT course ( course id, title, dept name, credits)> <!ELEMENT instructor (IID, name, dept name, salary)> <!ELEMENT teaches (IID, course id)> <!ELEMENT dept name( #PCDATA )> <!ELEMENT building( #PCDATA )> <!ELEMENT budget( #PCDATA )> <!ELEMENT course id ( #PCDATA )> <!ELEMENT title ( #PCDATA )> <!ELEMENT credits( #PCDATA )> <!ELEMENT IID( #PCDATA )> <!ELEMENT name( #PCDATA )> <!ELEMENT salary( #PCDATA )> ]> XML Sistemas de Datos Attribute Specification in DTD Attribute specification : for each attribute Name Type of attribute CDATA ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs) Whether more on this later mandatory (#REQUIRED) has a default value (value), or neither (#IMPLIED) Examples <!ATTLIST course course_id CDATA #REQUIRED>, or <!ATTLIST course course_id ID #REQUIRED dept_name IDREF #REQUIRED instructors IDREFS #IMPLIED > XML Sistemas de Datos IDs and IDREFs An element can have at most one attribute of type ID The ID attribute value of each element in an XML document must be distinct Thus the ID attribute value is an object identifier An attribute of type IDREF must contain the ID value of an element in the same document An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document XML Sistemas de Datos University DTD with Attributes University DTD with ID and IDREF attribute types. <!DOCTYPE university-3 [ <!ELEMENT university ( (department|course|instructor)+)> <!ELEMENT department ( building, budget )> <!ATTLIST department dept_name ID #REQUIRED > <!ELEMENT course (title, credits )> <!ATTLIST course course_id ID #REQUIRED dept_name IDREF #REQUIRED instructors IDREFS #IMPLIED > <!ELEMENT instructor ( name, salary )> <!ATTLIST instructor IID ID #REQUIRED dept_name IDREF #REQUIRED > · · · declarations for title, credits, building, budget, name and salary · · · ]> XML Sistemas de Datos XML data with ID and IDREF attributes <university-3> <department dept name=“Comp. Sci.”> <building> Taylor </building> <budget> 100000 </budget> </department> <department dept name=“Biology”> <building> Watson </building> <budget> 90000 </budget> </department> <course course id=“CS-101” dept name=“Comp. Sci” instructors=“10101 83821”> <title> Intro. to Computer Science </title> <credits> 4 </credits> </course> …. <instructor IID=“10101” dept name=“Comp. Sci.”> <name> Srinivasan </name> <salary> 65000 </salary> </instructor> …. </university-3> XML Sistemas de Datos Limitations of DTDs No typing of text elements and attributes All values are strings, no integers, reals, etc. Difficult to specify unordered sets of subelements Order is usually irrelevant in databases (unlike in the document-layout environment from which XML evolved) (A | B)* allows specification of an unordered set, but Cannot ensure that each of A and B occurs only once IDs and IDREFs are untyped The instructors attribute of an course may contain a reference to another course, which is meaningless instructors attribute should ideally be constrained to refer to instructor elements XML Sistemas de Datos XML Schema Newer, increasing use. XML Sistemas de Datos XML Sistemas de Datos XML Schema XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports Typing of values User-defined, comlex types Many more features, including uniqueness and foreign key constraints, inheritance XML Schema is itself specified in XML syntax, unlike DTDs E.g. integer, string, etc Also, constraints on min/max values More-standard representation, but verbose XML Scheme is integrated with namespaces BUT: XML Schema is significantly more complicated than DTDs. XML Sistemas de Datos XML Schema Version of Univ. DTD <xs:schema xmlns:xs=“http://www.w3.org/2001/XMLSchema”> <xs:element name=“university” type=“universityType” /> <xs:element name=“department”> <xs:complexType> <xs:sequence> <xs:element name=“dept name” type=“xs:string”/> <xs:element name=“building” type=“xs:string”/> <xs:element name=“budget” type=“xs:decimal”/> </xs:sequence> </xs:complexType> </xs:element> …. <xs:element name=“instructor”> <xs:complexType> <xs:sequence> <xs:element name=“IID” type=“xs:string”/> <xs:element name=“name” type=“xs:string”/> <xs:element name=“dept name” type=“xs:string”/> <xs:element name=“salary” type=“xs:decimal”/> </xs:sequence> </xs:complexType> </xs:element> … Contd. XML Sistemas de Datos XML Schema Version of Univ. DTD (Cont.) …. <xs:complexType name=“UniversityType”> <xs:sequence> <xs:element ref=“department” minOccurs=“0” maxOccurs=“unbounded”/> <xs:element ref=“course” minOccurs=“0” maxOccurs=“unbounded”/> <xs:element ref=“instructor” minOccurs=“0” maxOccurs=“unbounded”/> <xs:element ref=“teaches” minOccurs=“0” maxOccurs=“unbounded”/> </xs:sequence> </xs:complexType> </xs:schema> Choice of “xs:” was ours -- any other namespace prefix could be chosen Element “university” has type “universityType”, which is defined separately xs:complexType is used later to create the named complex type “UniversityType” XML Sistemas de Datos More features of XML Schema Attributes specified by xs:attribute tag: <xs:attribute name = “dept_name”/> adding the attribute use = “required” means value must be specified Key constraint: “department names form a key for department elements under the root university element: <xs:key name = “deptKey”> <xs:selector xpath = “/university/department”/> <xs:field xpath = “dept_name”/> <\xs:key> Foreign key constraint from course to department: <xs:keyref name = “courseDeptFKey” refer=“deptKey”> <xs:selector xpath = “/university/course”/> <xs:field xpath = “dept_name”/> <\xs:keyref> XML Sistemas de Datos Vendor Solutions Además de las ofertas de middleware, las bases de datos más populares están habilitadas para XML. Es decir, que tienen soporte nativo para la conversión de datos relacionales a XML y viceversa. De hecho, todos los proveedores principales de base de datos relacionales tienen extensiones propietarias para el uso de XML con su producto, pero cada uno tiene un enfoque completamente diferente, y hay poca interoperabilidad entre ellos. Los "Tres Grandes" fabricantes (IBM, Oracle y Microsoft) tienen completo soporte de XML, el almacenamiento de todo el documento XML, y soportan de alguna forma XQuery. XML Sistemas de Datos IBM DB2 IBM provides a truly unified XML/relational database, supporting the XML data model from the client through the database, "down to the disk and back again" through a first-class XML data type. By deeply implementing XML into a database engine that previously was purely relational, IBM offers superior flexibility and performance relative to other offerings. IBM DB2 XML support DB2 manages both conventional relational and XML data. As depicted in the Storage component of the figure, relational and XML data are stored in different formats that match their respective models: relational as traditional row-column structures; and XML as hierarchical node structures. Both types of storage are accessed via the DB2 engine which processes plain SQL, SQL/XML and XQuery in an integrated fashion. SQL and XQuery are handled in a single modelling framework, avoiding the need to translate queries between them, via so-called bilingual queries that give developers the flexibility to use the language that matches not just application needs but also their skills. Applications can continue to use SQL to manipulate relational data or the XML store. SQL/XML extensions enable publishing relational data in XML format based on data retrieved by embedding XPath or XQuery into SQL statements. XML applications typically use XQuery to access the XML store; yet XQuery queries can optionally contain SQL to combine and correlate XML with relational data. XML Sistemas de Datos Oracle XML DB Oracle has been steadily evolving its support for XML since 1998, moving toward flexible, high-performance, scalable XML storage and processing. With new version releases every few years, they have progressed from loosely-coupled XML APIs, to XML storage and repository support, later adding XQuery then binary XML storage and indexing. Oracle XML DB features XML DB implements the major W3C standards (e.g., XML, Namespace, XPath, XML Schema, XSLT). They claim the first major implementation of XQuery as well as support for SQL/XML. This hybrid database provides SQLcentric access to XML content, and XML-centric access to relational content. Multiple XML storage options allow tuning for optimal application performance. An XML DB repository is a nice addition for serving document-centric needs. XML Sistemas de Datos XML Sistemas de Datos XML Sistemas de Datos XML Sistemas de Datos XML Sistemas de Datos Microsoft SQL Server Microsoft's SQL Server architecture. This product features XML storage, indexing and query processing. The XML data type provides a simple mechanism for storing XML data by inserting it into an untyped XML column. The XML data type preserves document order and is useful for applications such as document management applications. Alternatively, XML Schemas may be used to define typed XML; this helps the database engine to optimize storage and query processing in addition to providing data validation. The SQL Server can also handle recursive XML Schemas as well as server-side XQuery. Microsoft SQL server architecture Microsoft still marches to its own drummer in some respects. Their SQLXML mapping technology is used to layer an XML-centric programming model over relational data stored in tables at the server. (Note SQLXML is completely different from SQL/XML; the similarity in names can cause quite a bit of confusion.) The mapping is based on defining an XML schema as an XML view. This provides a bi-directional mapping of an XML Schema to relational tables. This approach can be used for bulk loading XML data into tables and for querying the tables. Document order is not preserved, however, so the mapping technology is useful for XML data processing as opposed to XML document processing. Microsoft still advocates sticking with a relational model for structured data with a known schema. XML Sistemas de Datos Microsoft SQL Server Microsoft SQL Server, currently version 2005, is a popular and powerful database server. XML support, including XQuery support and the addition of an XML column type, is one of the primary areas of improvement in this version Retrieving XML SQL Server's T-SQL dialect includes the FOR XML clause for SELECT queries. This clause, which must be the last clause in the SELECT statement, causes the data returned from the query to be formatted as XML. This feature was first added with SQL Server 2000, but it has been improved in SQL Server 2005. The actual format of the XML is configurable using one of the optional keywords listed in the following table. FOR XML Formatting RAW Notes Each row in the query is returned as an XML element. Individual columns are returned as attributes of that element. There is no root node by default, although this can be added. By default, the element name is row. This can be changed by including the name as a parameter to RAW (FOR XML RAW(‘myrowname’) ). AUTO Each row is returned as an XML element named for the table providing the data. Individual columns returned are attributes of that element. There is no root node by default. If related columns are included, the resulting XML is nested. EXPLICIT The structure of the resulting XML must be defined. This provides the most flexibility in creating XML, but also requires the most work by the developer. PATH The structure of the resulting XML can be defined. This method, added with SQL Server 2005, is much easier to use than the EXPLICIT model. By default, it creates a structure similar to the AUTO output, but columns are output as elements, not attributes. XML Sistemas de Datos Microsoft SQL Server Storing XML SQL Server 2005 adds support for the XML column type. You can create a table containing one of these columns just as you can for any other data type (see Listing 11-7). After the table is created, you can populate and query it just as you do any other table: INSERT INTO dbo.Articles(Title, Body) VALUES('Welcome', '<div class="wrapper">Welcome to the system</div>') SELECT Body FROM dbo.Articles Simply dumping XML into an XML column, although it is useful, has few benefits over using a text column. To improve the process, you can add an XML Schema to the column. Then, adding data to the table triggers validation, ensuring the column contains data of the appropriate type. To do this with SQL Server, you create a schema collection in the database. The CREATE XML SCHEMA COLLECTION command creates the schema collection (see Listing 11-8). In addition to adding an entry in the database for the schema, adding a schema collection to a database creates a number of new system tables and views to track the schemas, as well as support validation. XML Sistemas de Datos Bases XML-Nativas Xindice Apache Xindice is a database designed from the ground up to store XML data or what is more commonly referred to as a native XML database. The name is pronounced zeen-deechay in your best faux Italian accent. Don't worry if you get it wrong though, we won't mind. We just care that you spell it correctly. You might be wondering what a native XML database is good for? Well it pretty much has one purpose, storing XML data. If you don't have any XML data, don't want any XML data or think XML is the most over-hyped technology of the new millennium, then Xindice is not for you. We're not out to change the way data in general is stored, only to provide a good solution for storing XML data. If you survey your projects and see XML popping out of every corner, then Xindice might be a real help for storing that XML. The benefit of a native solution is that you don't have to worry about mapping your XML to some other data structure. You just insert the data as XML and retrieve it as XML. You also gain a lot of flexibility through the semi-structured nature of XML and the schema independent model used by Xindice. This is especially valuable when you have very complex XML structures that would be difficult or impossible to map to a more structured database. At the present time Xindice uses XPath for its query language and XML:DB XUpdate for its update language. We provide an implementation of the XML:DB API for Java development and it is possible to access Xindice from other languages using built in XML-RPC API. As standards in the XML database area mature Xindice will include support for those that are most important. Xindice is the continuation of the project that used to be called the dbXML Core. The dbXML source code was donated to the Apache Software Foundation in December of 2001.