Technical Questions With Answers - Data Management
The document provides definitions and explanations of key concepts in SQL, databases, data warehousing, and big data. It defines DCL, DML, and DDL in SQL and their functions. It explains concepts like JOIN, UNION, aggregation functions, and normalization. It also defines databases, DBMS, RDBMS and their differences. For data warehousing, it defines data warehouse characteristics, differences between OLTP and OLAP, ETL process, and star and snowflake schemas. Finally, it provides a brief definition of big data.
Technical Questions With Answers - Data Management
The document provides definitions and explanations of key concepts in SQL, databases, data warehousing, and big data. It defines DCL, DML, and DDL in SQL and their functions. It explains concepts like JOIN, UNION, aggregation functions, and normalization. It also defines databases, DBMS, RDBMS and their differences. For data warehousing, it defines data warehouse characteristics, differences between OLTP and OLAP, ETL process, and star and snowflake schemas. Finally, it provides a brief definition of big data.
-all of them are sublanguages divided from SQL. -DCL (Data Control Language) is responsible for the administrative tasks of controlling the database itself; like: permissions for users. Commands: GRANT, REVOKE and DENY. -DML (Data Manipulation Language) is responsible for adding, editing or deleting data from database. Commands: INSERT, UPDATE and DELETE. -DDL (Data Definition Language) is responsible for defining the way data is structured in a database. Commands: CREATE TABLE, ALTER TABLE and DROP TABLE. ▪ What is the difference between group by and having? -GROUP BY clause is used for grouping the records of the database table, creates a single row for each group. -HAVING clause is used to filter data that we get from group by clause. ▪ What is order by? Can we order more than one column? -By using ORDER BY clause, we can sort the result in ascending or descending order. Yes we can. What is the difference between union and join? -JOIN: used to combine rows from two or more tables, based on a related column between them. -UNION: used to combine the result-set of two or more SELECT statements. What is the difference type of join? -INNER JOIN: Returns records that have matching values in both tables. -LEFT JOIN: Returns all records from the left table, and the matched records from the right table. -RIGHT JOIN: Returns all records from the right table, and the matched records from the left table. -OUTER JOIN: Returns all records when there is a match in either left or right table. What are the aggregate functions? -it’s a mathematical computation involving a range of values that results in just a single value. What are the SQL statements Sequence? -(Select →From → Join → Where→ group by→ having→ order by). What is the view? + Why we use it? -it’s a virtual table that contains data from one or multiple tables. - CREATE VIEW view_name AS SELECT column1, column2 FROM table_name WHERE [condition]; What is the SQL transaction? -it’s a grouping of one or more SQL statements that interact with a database. What is the difference between delete and truncate? -delete: can delete conditionally (by WHERE clause), truncate: delete all the data unconditionally. -delete: can be rolled back, truncate: can’t be rolled back. How can insert column to the table? -by using INSERT INTO statement. INSERT INTO table_name (column1, column2) VALUES (value1, value2) How can insert multi rows in only one insert statement? -by listting the values for each row separated by commas, following the VALUES clause of the statement. INSERT INTO table-name (column1, column2) VALUES (row1_value1, row1_value2), (row2_value1, row2_value2), (row3_value1, row3_value2); Database: What is database, DBMS and RDBMS? -database: it’s an organized collection of data, so that it can be easily accessed and managed. -DBMS: (database management system) It’s a software system for creating and managing databases. -RDBMS: (relational database management system) it’s a type of database that stores and provides access to data points that are related to one another. What are the kinds of attributes? - Simple Attributes, Composite Attributes, Single Valued Attributes, Multi-Valued Attributes, Derived Attributes, Key Attributes. What is the ERD? - (entity relationship diagram) is a graphical representation that depicts relationships among people, objects, places, concepts or events within an information technology (IT) system. What is the type of constraints? -Domain Constraints in DBMS Key Constraints in DBMS Entity Integrity Constraints in DBMS Referential Integrity Constraints in DBMS Tuple Uniqueness Constraints in DBMS. What is the difference between primary key and foreign key? -A primary key is used to ensure data in the specific column is unique. A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. What is delete set null and delete cascade? -DELETE CASCADE: deletes the rows in the child table that is corresponding to the row deleted from the parent table. DELETE SET NULL: sets the rows in the child table to NULL if the corresponding rows in the parent table are deleted. ▪ What is the normalization and why are we making it? -Normalization is the process of organizing data in a database. -We make it to Reduce data redundancy and inconsistency. What are the types of normalization? - First normal form(NF1), second normal form(NF2), and third normal form(NF3). What are the update anomalies? -update anomalies: affects multiple rows or columns unintentionally. -example: is when updating a record in a table that has redundant data. If a record contains multiple instances of the same data, updating one instance of that data can cause inconsistencies and errors in the database. What is the difference between SQL and PL/SQL? -SQL, is Structural Query Language for database. -PL/SQL is a programming language using SQL for a database. What are the types of loops in PL/SQL? -Basic Loop / Exit Loop. -While Loop. -For Loop. -Cursor For Loop. What are the cursors and what are the cursors types? -Cursor is a Temporary Memory or Temporary Work Station. It is Allocated by Database Server at the Time of Performing DML operations on the Table by the User. -Types: Implicit Cursors, and Explicit Cursors. What is the procedure? -it’s the subroutines that can contain one or more SQL statements that perform a specific task. What is difference between procedure and function? -Functions calculate the results of a program on the basis of the given input. -Procedures perform certain tasks in a particular order on the basis of the given inputs. What are the triggers and what are the triggers types? -it’s a statement that a system executes automatically when there is any modification to the database. -types: AFTER INSERT, AFTER UPDATE, AFTER DELETE, BEFORE INSERT, BEFORE UPDATE, BEFORE DELETE. Data Warehouse: ▪ What is the data warehouse? -it’s a process of transforming data into information and make it available to users in a timely manner to make a difference. -it’s the core of the BI system built for data analysis and reporting. ▪ What are the characteristics of data warehouse? -Integrated/Time-variant/Subject-oriented/Persistent and non-volatile. ▪ What is the difference between database and data warehouse? -volume: DB < DWH -historical: DB → short-term, DWH → long-term -rows: DB < DWH -orientation: DB → product, DWH → subject or multi products -business units: DB → product team, DWH → multi organizational units -normalization: DB → normalized, DWH → not required -data model: DB → relational, DWH → star schema or multi-dim -intelligence: DB → reporting, DWH → advanced reporting and machine learning -use cases: DB → online transactions & reporting, DWH → centeralized storage (360ْ) ▪ What is the difference between data warehouse and big data? -data warehouse is the collection of historical data from different operations in an enterprise, it’s an architecture used to organize the data. -big data is the data which is in enormous form on which technologies can be applied, it’s a technology to store and manage large amount of data. ▪ What is the difference between OLTP and OLAP? -OLAP (Online Analytical Processing): Consists of historical data from various Databases, this data is used in planning, problem-solving, and decision-making. -OLTP (Online Transaction Processing): Consists of only operational current data, this data is used to perform day-to-day fundamental operations. ▪ What is Data Warehousing? -A technique for collecting and managing data from varies sources to provide meaningful business insights. It’s a blend of technologies and components which aids the strategic use of data. ▪ What are the processes that can be done on data warehouse? -report information related to different sources in one report. -transfer data from the source system to one single database. -the decision from DWH is global and strategical. ▪ What is Data Modeling? + Types of Data Modeling? -it’s the process of creating a simplified diagram of a software system and the data elements it contains, using text and symbols to represent the data and how it flows. -types: Conceptual data model, Logical data model, Physical data model. ▪ What is data mart? -is a subset of the DWH. It specially designed for a particular line of business ,such as sales, finance, sales or finance. In an independent data mart, data can collect directly from sources. ▪ What is Data Cube? -It is a data abstraction to evaluate aggregated data from a variety of viewpoints. ▪ What is the ETL? -it reads from staging layer, send data to datawarehous. -stands for extract, transformation, load. -extract data from its original source. -transform data by deduplicating it, combining it, and ensuring quality. -load data into the target database. ▪ What is the difference between fact and dimension table? -Fact tables contain numerical data, while dimension tables provide context and background information. ▪ What is the difference between snowflake and star schema? -star schema: is a multi-dimensional data model used to organize data in a database so that it is easy to understand and analyze. Star schemas can be applied to data warehouses, databases, data marts, and other tools. The star schema design is optimized for querying large data sets. -snowflake schema: is a multi-dimensional data model that is an extension of a star schema, where dimension tables are broken down into subdimensions. Snowflake schemas are commonly used for business intelligence and reporting in OLAP data warehouses, data marts, and relational databases. Other names of the data warehouse system: -decision support system. -business intelligence solution. -Executive information system. -management information system. -analytic application. -data warehouse. Big Data: Why is the big data? -Companies use big data in their systems to improve operations, provide better customer service, create personalized marketing campaigns and take other actions that, ultimately, can increase revenue and profits. What is the big data? (V's of Big Data) -it’s a large set of data that, at the time, was almost impossible to manage and process using the traditional business intelligence tools available, -V’s of big data: Volume الحجم, Velocity الرسعة, Variety التنوع, Veracity الموثوقية, Value القيمة. What are the big data types? - structured, unstructured, and semi-structured. What is Data Lake? - it’s a centralized repository that allows you to store all your structured and unstructured data at any scale. What is the difference between ETL & ELT? - ETL: (Extract, Transform, and Load) involves transforming data on a separate processing server before transferring it to the data warehouse. -ELT: (Extract, Load, and Transform) performs data transformations directly within the data warehouse itself. Business Intelligence: What is the Business Intelligence? -It’s a software that ingests business data and presents it in user- friendly views such as reports, dashboards, charts and graphs. What are the steps in BI? -Information gathering -Analysis -Reporting -Monitoring and Prediction. What are the tools we use in BI (for ETL, Analysis, and Visualization)? -Bower BI, Charito, Looker, Google Data Studio, Tableau, Domo.
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"