Approach and Challenges for ETL Verification / Data Warehouse:
In this piece, I will briefly explain to my fellow testers about what is fast becoming a sought out and emerging skill in the testing community i.e. ETL testing, also known as Extract, Transform, and Load testing.
Recommended IPTV Service Providers
- IPTVGREAT – Rating 4.8/5 ( 600+ Reviews )
- IPTVRESALE – Rating 5/5 ( 200+ Reviews )
- IPTVGANG – Rating 4.7/5 ( 1200+ Reviews )
- IPTVUNLOCK – Rating 5/5 ( 65 Reviews )
- IPTVFOLLOW -Rating 5/5 ( 48 Reviews )
- IPTVTOPS – Rating 5/5 ( 43 Reviews )
This tutorial will equip you with a complete understanding of ETL testing, its procedures and how we undertake to verify the ETL process.
The enlisted tutorials in this series are as follows:
- Tutorial #1: Introductory Guide to ETL Testing and Data Warehouse Testing
- Tutorial #2: ETL Testing Using Informatica PowerCenter Tool
- Tutorial #3: Comparison of ETL and DB Testing
- Tutorial #4: Business Intelligence (BI) Testing: How to Validate Business Data
- Tutorial #5: Top 10 ETL Testing Tools
There’s a growing recognition that third-party verification and validation possess a large market potential with many companies identifying this as an opportunity for business growth.
Clients have access to a diverse variety of products in terms of services on offer, distributed in several areas based on technology, process, and solutions. The ETL or data warehouse is one such offering that’s growing quickly and potently.
In the ETL process, data is extracted from source systems, transformed in accordance with business rules, and subsequently loaded into the target system (data warehouse). A data warehouse is a repository enterprise-wide which stores consolidated data that assists in business decision-making processes. It forms part of business intelligence.
You Will Learn:
Why is a Data Warehouse Necessary for Organizations?
Organizations with structured IT practices are poised to bring about the next level of technology evolution. Now, they are striving to be more functional with conveniently interconnected data.
Considering that data forms the crux of any organization, be it daily data or historical data. Data forms the foundation of all reports, and these reports underpin key management decisions.
Numerous organizations are taking the initiative to construct their data warehouse to store and track real-time data as well as historical data. Constructing an efficient data warehouse is no easy task. A multitude of organizations have various departments with differing applications running on distributed technology.
ETL tools are used to facilitate smooth integration between different data sources from different departments.
The ETL tool functions as an integrator, extracting data from different sources, transforming it into the required format grounded on business transformation rules, and loading it into a unified database known as a data warehouse.
A well-planned, well-defined, and effective testing scope ensures a seamless conversion of the project to production. The ETL processes need to be validated and verified by an independent group of experts to ensure that the data warehouse is reliable and robust, which results in genuine momentum for a business.
Irrespective of technology or ETL tools used, ETL or data warehouse testing falls into four distinct categories:
- New Data Warehouse Testing: In this, a new data warehouse is built from scratch and tested. Data inputs are derived from customer requirements and various data sources, and a new data warehouse is created and tested with the aid of ETL tools.
- Migration Testing: In this type of project, clients already possess a data warehouse and ETL performing the given task, but they are exploring new tools to enhance efficiency.
- Change Request: In such a project, new data is appended from diverse sources to an existing data warehouse. Also, situations may arise where clients need to modify their existing business rules or introduce new ones.
- Report Testing: The report is the eventual outcome of any data warehouse and the primary reason for its creation. Verification of the report involves validating the layout, data in the report, and calculations.
ETL Procedure
Techniques for ETL Testing
1) Data Transformation Testing: Verify whether data is accurately transformed according to various business requirements and rules.
2) Source to Target Count Testing: Ensure that the count of records loaded in the target matches the anticipated count.
3) Source to Target Data Testing: Ensure all projected data is loaded into the data warehouse, with no truncation or loss of data.
4) Data Quality Testing: Verify that the ETL application appropriately rejects, substitutes with default values, and reports faulty data.
5) Performance Testing: Ensure that data is loaded into the data warehouse within the expected time frames, in order to ensure improved performance and scalability.
6) Production Validation Testing: Authenticate data in the production system & compare it with the source data.
7) Data Integration Testing: Make sure the data from various sources has been correctly loaded into the target system, and all threshold values are examined.
8) Application Migration Testing: In this testing, confirm that the ETL application operates correctly when moved to a new box or platform.
9) Data & Constraint Check: In this case, testing involves the datatype, length, index, constraints, etc.
10) Duplicate Data Check: Test if any duplicate data exists in the target system. The presence of duplicate data can lead to inaccurate analytical reports.
In addition to the above ETL testing methods, other testing methods such as system integration testing, user acceptance testing, incremental testing, regression testing, retesting, and navigation testing are also carried out to ensure the process is seamless and reliable.
Procedure for Testing ETL/Data Warehouse
Much like any other testing that comes under Independent Verification and Validation, ETL testing also follows the same phases.
- Understanding Requirements
- Validation
- Estimating Testing, which is determined by the number of tables, rules’ complexity, data volume, and the associated task’s performance.
- Planning Testing is based on inputs from the testing estimation process and business requirements. It’s important to identify what is in scope and what is out of scope, including dependencies, risks and planned mitigations during this phase.
- Designing Testing cases and Testing scenarios based on all available inputs. It’s also necessary to create mapping documents and SQL scripts.
- Once all testing cases have been prepared and approved, the testing team will conduct pre-execution checks and Testing data preparation.
- Lastly, execution is performed until the exit criteria have been met. This encompasses running ETL jobs, monitoring job runs, executing SQL scripts, logging defects, retesting defects, and regression testing.
- Upon successful completion, a summary report is prepared and the test closure process is done. In this stage, the job or code promotes to the next phase after sign-off has been given.
The first two phases, i.e., understanding requirements and validation, are considering pre-steps of ETL testing process.
Visualize the main process as follows:
It is important to establish a testing strategy that has been mutually agreed upon by the stakeholders before starting the actual testing process. A clearly defined testing strategy ensures that a proper approach has been adopted to fulfill the testing goals.
ETL/Data Warehouse testing may need the testing team to write SQL queries extensively, or alternately tailor the SQL provided by the development team. Regardless, a testing team should be aware of the results they are intending to fetch with those SQL queries.
Differences Between Database and Data Warehouse Testing
There’s a common misunderstanding that data warehouse testing and database testing are similar, when the reality is that both hold different directions in terms of testing.
- Database testing is carried out using data on a smaller scale, typically with OLTP (Online Transaction Processing) databases, while data warehouse testing entails handling large volumes of data with OLAP (Online Analytical Processing) databases.
- In database testing, data is usually consistently injected from uniformed sources, while data warehouse testing predominately deals with data from different types of sources that are inherently inconsistent.
- Typically, the basic CRUD (Create, Read, Update, and Delete) operations are performed during database testing, while in data warehouse testing primarily read-only operations are executed.
- Normalized databases are used in database testing, while in data warehouse testing denormalized databases are being used.
There are several mandatory tests that need to be performed for any kind of data warehouse testing.
Below is a list of items that are crucial for validation during this testing:
- Verify that data transformation from source to target is done as expected.
- Verify that the expected data is loaded into the target system.
- Confirm that all database fields and field data are loaded without any truncation.
- Verify data checksum for a matching record count.
- Ensure that proper error logs are generated for rejected data, along with relevant details.
- Verify NULL value fields
- Make sure there’s no duplicate data loaded.
- Check data integrity
=> Learn more about the difference between ETL/Data warehouse testing & Database Testing.
Challenges in ETL Testing
Testing with this approach is quite unlike traditional testing. Numerous challenges are encountered while conducting data warehouse testing.
Below are some challenges that I’ve faced in my projects:
- Incompatible and repetitive data
- Data loss during the ETL process
- Lack of comprehensive testbeds
- Testing teams often don’t have the authorization to run ETL jobs independently
- The sheer volume and complexity of the data
- Inadequate business processes and procedures
- Difficulties in acquiring and forming test data
- Unstable testing environments
- Lack of information on business flows
Data is crucial for making key business decisions. ETL testing plays a pivotal role in validating and ensuring that the business information is accurate, consistent, and reliable, and it also decreases the risk of data loss in production.
I hope these tips will help ensure your ETL process is accurate, and the data warehouse you create from it becomes a competitive edge for your business.
Complete Compilation of ETL Testing Tutorials:
- Tutorial #1: Introductory guide to ETL Testing and Data Warehouse Testing
- Tutorial #2: Conducting ETL Testing Using the Informatica PowerCenter Tool
- Tutorial #3: ETL vs. DB Testing
- Tutorial #4: Testing Business Intelligence (BI): Validating Business Data
- Tutorial #5: Top 10 Tools for ETL Testing
This is a guest post by Vishal Chhaperia who works in an MNC in a test management role. He has extensive experience in managing multi-technology QA projects, Processes and teams.
Further Reading =>> Top ETL Testing Automation Tools
Have you had experience with ETL testing? Feel free to share your ETL/Data warehouse testing tips and challenges below.