ETL Testing Data Warehouse Testing Tutorial (A Complete Guide)

ETL Testing / Data Warehouse Process and Challenges:

Today let me take a moment and explain my testing fraternity about one of the most demanding and upcoming skills for my tester friends i.e. ETL testing (Extract, Transform, and Load).

This tutorial will present you with a complete idea about ETL testing and what we do to test the ETL process.

Complete List Tutorials in this series: 

  • Tutorial #1: ETL Testing Data Warehouse Testing Introduction Guide
  • Tutorial #2: ETL Testing Using Informatica PowerCenter Tool
  • Tutorial #3: ETL vs. DB Testing
  • Tutorial #4: Business Intelligence (BI) Testing: How to Test Business Data
  • Tutorial #5: Top 10 ETL Testing Tools

It has been observed that Independent Verification and Validation is gaining huge market potential and many companies are now seeing this as a prospective business gain.

ETL Testing Data Warehouse Testing (1)

Customers have been offered a different range of products in terms of service offerings, distributed in many areas based on technology, process, and solutions. ETL or data warehouse is one of the offerings which are developing rapidly and successfully.

ETL process

Through ETL process, data is fetched from the source systems, transformed as per business rules and finally loaded to the target system (data warehouse). A data warehouse is an enterprise-wide store which contains integrated data that aids in the business decision-making process. It is a part of business intelligence.

Why do Organizations Need Data Warehouse?

Organizations with organized IT practices are looking forward to creating the next level of technology transformation. They are now trying to make themselves much more operational with easy-to-interoperate data.

Having said that data is the most important part of any organization, it may be everyday data or historical data. Data is the backbone of any report and reports are the baseline on which all vital management decisions are taken.

Most companies are taking a step forward in constructing their data warehouse to store and monitor real-time data as well as historical data. Crafting an efficient data warehouse is not an easy job. Many organizations have distributed departments with different applications running on distributed technology.

ETL tool is employed in order to make a flawless integration between different data sources from different departments.

The ETL tool will work as an integrator, extracting data from different sources; transforming it into the preferred format based on the business transformation rules and loading it into a cohesive DB known as Data Warehouse.

Well planned, well defined and effective testing scope guarantees smooth conversion of the project to production. A business gains real buoyancy once the ETL processes are verified and validated by an independent group of experts to make sure that the data warehouse is concrete and robust.

ETL or Data warehouse testing is categorized into four different engagements irrespective of the technology or ETL tools used:

  • New Data Warehouse Testing: New DW is built and verified from scratch. Data input is taken from customer requirements and different data sources and a new data warehouse is built and verified with the help of ETL tools.
  • Migration Testing: In this type of project, customers will have an existing DW and ETL performing the job, but they are looking to bag new tools in order to improve efficiency.
  • Change Request: In this type of project new data is added from different sources to an existing DW. Also, there might be a condition where customers need to change their existing business rules or they might integrate the new rules.
  • Report Testing: Report is the end result of any Data Warehouse and the basic propose for which DW builds. The report must be tested by validating the layout, data in the report and calculation.

ETL Process

ETL testing

ETL Testing Techniques

1) Data Transformation Testing: Verify if data is transformed correctly according to various business requirements and rules.

2) Source to Target Count Testing: Make sure that the count of records loaded in the target is matching with the expected count.

3) Source to Target Data Testing: Make sure that all projected data is loaded into the data warehouse without any data loss or truncation.

4) Data Quality Testing: Make sure that ETL application appropriately rejects, replaces with default values and reports invalid data.

5) Performance Testing: Make sure that data is loaded in the data warehouse within the prescribed and expected time frames to confirm improved performance and scalability.

6) Production Validation Testing: Validate the data in the production system & compare it against the source data.

7) Data Integration Testing: Make sure that the data from various sources has been loaded properly to the target system and all the threshold values are checked.

8) Application Migration Testing: In this testing, ensure that the ETL application is working fine on moving to a new box or platform.

9) Data & constraint Check: The datatype, length, index, constraints, etc. are tested in this case.

10) Duplicate Data Check: Test if there is any duplicate data present in the target system. Duplicate data can lead to incorrect analytical reports.

Apart from the above ETL testing methods, other testing methods like system integration testing, user acceptance testing, incremental testing, regression testing, retesting and navigation testing are also carried out to make sure that everything is smooth and reliable.

ETL/Data Warehouse Testing Process

Similar to any other testing that lies under Independent Verification and Validation, ETL is also going through the same phase.

  • Requirement Understanding
  • Validating
  • Test Estimation is based on a number of tables, the complexity of rules, data volume and performance of a job.
  • Test Planning is based on the inputs from test estimation and business requirements. We need to identify here as what is in scope and what is out of scope. We will also look out for dependencies, risks and mitigation plans during this phase.
  • Designing Test cases and Test scenarios from all the available inputs. We also need to design mapping documents and SQL scripts.
  • Once all the test cases are ready and approved, the testing team will proceed to perform pre-execution checks and test data preparation for testing.
  • Lastly, execution is performed until exit criteria are met. So, the execution phase includes running ETL jobs, monitoring job runs, SQL script execution, defect logging, defect retesting and regression testing.
  • Upon successful completion, a summary report is prepared and the closure process is done. In this phase, sign off is given to promote the job or code to the next phase.

The first two phases i.e., requirement understanding and validation can be regarded as pre-steps of ETL test process.

So, the main process can be represented as below:

ETL main process

It is necessary to define a test strategy which should be mutually accepted by stakeholders before starting actual testing. A well-defined test strategy will ensure that the correct approach has been followed to meet the testing aspirations.

ETL/Data Warehouse testing might require writing SQL statements extensively by the testing team or maybe tailoring the SQL provided by the development team. In any case, a testing team must be aware of the results that they are trying to get using those SQL statements.

Difference Between Database and Data Warehouse Testing

There is a popular misunderstanding that database testing and data warehouses are similar while the fact is that both hold different directions in testing.

  • Database testing is done using a smaller scale of data normally with OLTP (Online transaction processing) type of databases while data warehouse testing is done with large volume with data involving OLAP (online analytical processing) databases.
  • In database testing, normally data is consistently injected from uniform sources while in data warehouse testing most of the data comes from different kind of data sources which are sequentially inconsistent.
  • We generally only perform CRUD (Create, read, update and delete) operations during database testing while in data warehouse testing we use read-only (Select) operation.
  • Normalized databases are used in DB testing while demoralized DB is used in data warehouse testing.

There are a number of universal verifications that have to be carried out for any kind of data warehouse testing.

Given below is the list of objects that are treated as essential for validation in this testing:

  • Verify that data transformation from source to destination works as expected.
  • Verify that the expected data is added to the target system.
  • Verify that all DB fields and field data are loaded without any truncation.
  • Verify data checksum for record count match.
  • Verify that for rejected data proper error logs are generated with all the details.
  • Verify NULL value fields
  • Verify that duplicate data is not loaded.
  • Verify data integrity

=> Know the difference between ETL/Data warehouse testing & Database Testing.

ETL Testing Challenges

This testing is quite different from conventional testing. Many challenges are faced while performing data warehouse testing.

Here are a few challenges that I experienced on my project:

  • Incompatible and duplicate data
  • Loss of data during ETL process.
  • Unavailability of the inclusive testbed.
  • Testers have no privileges to execute ETL jobs on their own.
  • The volume and complexity of the data is huge.
  •  Fault in business processes and procedures.
  • Trouble acquiring and building test data
  • Unstable testing environment
  • Missing business flow information

Data is important for businesses to make critical business decisions. ETL testing plays a significant role in validating and ensuring that the business information is accurate, consistent and reliable. It also minimizes the hazard of data loss in production.

Hope these tips will help you ensure that your ETL process is accurate and the data warehouse built by this is a competitive advantage for your business.

Complete List of ETL Testing Tutorials:

  • Tutorial #1: ETL Testing Data Warehouse Testing Introduction guide
  • Tutorial #2: ETL Testing Using Informatica PowerCenter Tool
  • Tutorial #3: ETL vs. DB Testing
  • Tutorial #4: Business Intelligence (BI) Testing: How to Test Business Data
  • Tutorial #5: Top 10 ETL Testing Tools

This is a guest post by Vishal Chhaperia who is working in an MNC in a test management role. He has extensive experience in managing multi-technology QA projects, Processes and teams.

Further Reading =>> Best ETL Test Automation Tools

Have you worked on ETL testing? Please share your ETL/DW testing tips and challenges below.

Related Post

Leave a Reply

Your email address will not be published.