We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7
ADVANCED DATABASE CONCEPTS
Advanced Database Concepts
1. Distributed Databases • Concepts: A distributed database is a collection of multiple interconnected databases spread across different physical locations. They are managed by a distributed database management system (DDBMS), which ensures that data is accessible from any site within the distributed system. • Advantages: o Data Distribution: Data is distributed across various sites, which can enhance performance and reliability. o Improved Performance: Localized data access can reduce the load on individual servers and decrease latency. o Scalability: Adding more nodes or databases can enhance the system's capacity. o Reliability and Availability: Replication and redundancy increase fault tolerance, ensuring the system remains operational even if some sites fail. o Flexibility: It can be tailored to meet specific organizational needs, including geographic distribution and local autonomy. • Distributed Database Design: o Fragmentation: Dividing a database into smaller pieces or fragments that can be distributed across different locations. ▪ Horizontal Fragmentation: Dividing a table into rows. ▪ Vertical Fragmentation: Dividing a table into columns. o Replication: Copying data fragments and storing them in multiple locations to improve reliability and availability. o Allocation: Deciding where to place fragments and replicas across the distributed system based on factors like network latency, access patterns, and resource availability. 2. NoSQL Databases • Introduction to NoSQL: NoSQL (Not Only SQL) databases are designed to handle unstructured or semi-structured data and scale horizontally across many servers. They are particularly well-suited for large-scale data storage and real-time web applications. • Types of NoSQL Databases: o Document Stores: Store data as documents, usually in JSON or BSON format. Each document can have a unique structure, allowing flexibility. ▪ Examples: MongoDB, Couchbase. o Key-Value Stores: Store data as key-value pairs, where each key is unique, and the value can be any data type. ▪ Examples: Redis, DynamoDB. o Column-Family Stores: Store data in columns rather than rows, which allows for efficient querying and storage of sparse data. ▪ Examples: Apache Cassandra, HBase. o Graph Databases: Designed to store and query data in the form of graphs, with nodes, edges, and properties. They are ideal for applications involving complex relationships. ▪ Examples: Neo4j, Amazon Neptune. 3. Data Warehousing and Data Mining • Concepts: o Data Warehousing: A data warehouse is a centralized repository for storing large volumes of structured data from multiple sources. It supports business intelligence activities like querying, reporting, and data analysis. o Data Mining: The process of discovering patterns, correlations, and anomalies within large datasets to predict outcomes or extract useful information. • Architecture: o Data Sources: Raw data is collected from various operational databases, flat files, and external sources. o ETL Process (Extract, Transform, Load): Data is extracted from source systems, transformed into a suitable format, and loaded into the data warehouse. o Data Warehouse: Organized into fact and dimension tables, typically following a star or snowflake schema. o OLAP (Online Analytical Processing): Allows users to analyze data by providing multi-dimensional views of data and supporting complex queries. • OLAP (Online Analytical Processing): o MOLAP (Multidimensional OLAP): Data is pre-aggregated in a multidimensional cube, which allows for fast query performance. o ROLAP (Relational OLAP): Uses standard relational databases to store data and supports dynamic querying. o HOLAP (Hybrid OLAP): Combines the benefits of MOLAP and ROLAP by using a combination of pre-aggregated cubes and relational databases. • Data Mining Techniques: o Classification: Assigning data into predefined categories or classes. ▪ Examples: Decision Trees, Support Vector Machines. o Clustering: Grouping data into clusters based on similarity without predefined categories. ▪ Examples: K-Means, Hierarchical Clustering. o Association Rule Learning: Discovering interesting relationships or associations between variables in large datasets. ▪ Examples: Apriori Algorithm, FP- Growth. o Anomaly Detection: Identifying unusual patterns that do not conform to expected behavior. ▪ Examples: Isolation Forest, DBSCAN.