Etl Testing
Etl Testing
ETL testing is a data centric testing process to validate that the data has been
transformed and loaded into the target as expected.
ETL Testing is different from application testing because it requires a data centric testing
approach. Some of the challenges in ETL Testing are:
ETL Testing involves comparing of large volumes of data typically millions of records.
The data that needs to be tested is in heterogeneous data sources (eg. databases, flat
files).
Data is often transformed which might require complex SQL queries for comparing he
data.
ETL testing is very much dependent on the availability of test data with different test
scenarios.
.
Incorrect, incomplete or duplicate data.
DW system contains historical data, so the data volume is too large and extremely
complex to
ETL testers are normally not provided with access to see job schedules in the ETL
tool. They hardly have access to BI Reporting tools to see the final layout of reports
and data inside the reports.
Tough to generate and build test cases, as data volume is too high and complex.
ETL testers normally dont have an idea of end-user report requirements and
business flow of the information.
ETL testing involves various complex SQL concepts for data validation in the target
system.
Sometimes the testers are not provided with the source-to-target mapping
information.
Unstable testing environment delay the development and testing of a process
Example
A retail shop maintains a data warehouse where all the sales data will be
loaded at the month level, as business is growing day by day still more
data will be getting loaded.
The shop has been running for the past 10 years now the data warehouse
database size has got increased tremendously.
Also, the shop management says they do not want to view the report
at month level for 10-year-old data. Hence they are planning to remove the
data older than 10 years.
At the same time, they want to keep the data at year level instead of
month level.
So the requirement would be, roll up all month datas into year level for
data older than 10 years and delete the month level data.
ETL Testing Life Cycle/Process
Similar to other Testing Process, ETL also go through different phases.
The different phases of ETL testing process is as follows.
1. Requirement analysis
2. Test planning
3. Test design
4. Test execution
5. Defect retesting
The major inputs for the testing team would be data model and mapping document.
When we start our analysis itself we need to make sure that the source table or files are
correct.
Test planning
There is not much difference between functional test plans except few items, here we
need to mention the data flow in both scope and
out-scope sections.
Test design
Test cases will be prepared along with mapping document. In this
stage itself, we need to find requirement related defects by doing
analysis on source data and mapping documents such as data type,
Sign off
Based on exit criteria of test e execution, the sign off mail to be sent to stakeholders in
order to be proceeded push the code to next level
ETL Test Scenarios
Constraints check
Ensure that all required constraints are available.
Index check
Ensure that index created with required columns.
NULL check
Inject a data with NULL for an NOTNULL column and verify that data will be rejected.
Small:
Do not keep lengthy test cases, break into small ones which will make our job easy for
doing impact analysis and for regression test suite identification and execution.
Up to date:
There are chances happen like on job name, log filename, parameter file names, paths
could be changed.The ETL test case needs to be updated based on all modifications.
ETL Bugs
Table structure issue
Performance issue
Filter
Its an active transformation. A column can be selected as a filter with a condition. The
data will be returned if it satisfies the filter condition else data will be rejected.
Ex: select * From employees where department_id=10;
Expression
Its a passive transformation. An expression can be mentioned like concatenation or
replacement for NULL values. The expression will be applied to a specific column and
returned.
Ex: select first_name,salary+1000 totsal from employees;
Aggregator
Its an active transformation. An aggregate function can be applied to a measure such
as Max, Avg, Max, count and Min etc.
Ex: Select department_id ,min(salary),max(salary) From employees group by
department_id;
Joiner
Its an active transformation. It joins 2 or more sources along with join condition. The
data will be returned if it satisfies the join condition else data will be rejected.
Sorter
Its an active transformation. The sorting column can be selected
along with the order to be sorted either ascending or descending.
Based on the column and order the rows will be sorted.
Rank
The rank number will be generated based on given Grouping column and order.
Ex: select first_name, salary, rank() over( order by salary)rank from employees;
Data type and length for all columns of source and target
ETL jobs
Description at Depth