ETL Verification / Data Warehouse Approach and Difficulties:
Today let me take a little time and clarify my testing community about one of the most demanding and upcoming skills for my tester friends i.e. ETL verification (Extract, Modify, and Transfer).
Recommended IPTV Service Providers
- IPTVGREAT – Rating 4.8/5 ( 600+ Reviews )
- IPTVRESALE – Rating 5/5 ( 200+ Reviews )
- IPTVGANG – Rating 4.7/5 ( 1200+ Reviews )
- IPTVUNLOCK – Rating 5/5 ( 65 Reviews )
- IPTVFOLLOW -Rating 5/5 ( 48 Reviews )
- IPTVTOPS – Rating 5/5 ( 43 Reviews )
This tutorial will provide you with a thorough understanding of ETL verification and the actions we take to verify the ETL process.
Full Catalog Tutorials in this sequence:
- Tutorial #1: ETL Verification Data Warehouse Testing Introductory Guide
- Tutorial #2: ETL Verification Utilizing Informatica PowerCenter Tool
- Tutorial #3: ETL vs. DB Verification
- Tutorial #4: Business Intelligence (BI) Verification: How to Verify Business Data
- Tutorial #5: Top 10 ETL Verification Tools
It has been observed that Independent Confirmation and Validation is gaining significant market potential and many companies are now considering this as a potential business gain.
Customers have been provided with a various assortment of products in terms of service offerings, distributed in many areas based on technology, process, and solutions. ETL or data warehouse is one of the offerings which are developing rapidly and successfully.
Through ETL process, data is fetched from the source systems, modified as per business rules and finally loaded to the target system (data warehouse). A data warehouse is an enterprise-wide store which contains integrated data that aids in the business decision-making process. It is a part of business intelligence.
What You Will Learn:
Why do Organizations Need Data Warehouse?
Organizations with structured IT practices are looking forward to creating the next level of technology transformation. They are now trying to make themselves much more operational with easily-interconnected data.
Having said that data is the most important part of any organization, it may be everyday data or historical data. Data is the backbone of any report and reports are the baseline on which all vital management decisions are taken.
Most companies are taking a step forward in constructing their data warehouse to store and monitor real-time data as well as historical data. Crafting an efficient data warehouse is not a simple job. Many organizations have distributed departments with different applications running on distributed technology.
ETL tool is used in order to make a flawless integration between different data sources from different departments.
The ETL tool will work as an integrator, extracting data from different sources; modifying it into the desired format based on the business transformation rules and loading it into a cohesive DB known as Data Warehouse.
Well planned, well defined and effective testing scope ensures smooth conversion of the project to production. A business gains real momentum once the ETL processes are verified and validated by an independent group of experts to ensure that the data warehouse is solid and robust.
ETL or Data warehouse verification is classified into four different engagements irrespective of the technology or ETL tools used:
- New Data Warehouse Verification: New DW is built and verified from scratch. Data input is taken from customer requirements and different data sources and a new data warehouse is built and verified with the help of ETL tools.
- Migration Verification: In this type of project, customers will have an existing DW and ETL performing the job, but they are looking to acquire new tools to improve efficiency.
- Change Request: In this type of project new data is added from different sources to an existing DW. Also, there might be a condition where customers need to change their existing business rules or they might integrate the new rules.
- Report Verification: Report is the end result of any Data Warehouse and the basic purpose for which DW is built. The report must be verified by validating the layout, data in the report and calculation.
ETL Approach
ETL Verification Techniques
1) Data Modification Verification: Verify if data is modified correctly according to various business requirements and rules.
2) Source to Target Count Verification: Make sure that the count of records loaded in the target is matching with the expected count.
3) Source to Target Data Verification: Make sure that all projected data is loaded into the data warehouse without any data loss or truncation.
4) Data Quality Verification: Make sure that ETL application appropriately rejects, replaces with default values and reports invalid data.
5) Performance Verification: Make sure that data is loaded in the data warehouse within the prescribed and expected time frames to confirm improved performance and scalability.
6) Production Validation Verification: Validate the data in the production system & compare it against the source data.
7) Data Integration Verification: Make sure that the data from various sources has been loaded properly to the target system and all the threshold values are checked.
8) Application Migration Verification: In this verification, ensure that the ETL application is working fine on moving to a new box or platform.
9) Data & constraint Check: The datatype, length, index, constraints, etc. are tested in this case.
10) Duplicate Data Check: Test if there is any duplicate data present in the target system. Duplicate data can lead to incorrect analytical reports.
Apart from the above ETL verification methods, other verification methods like system integration verification, user acceptance verification, incremental verification, regression verification, retesting and navigation verification are also carried out to ensure that everything is smooth and reliable.
ETL/Data Warehouse Verification Process
Similar to any other verification that lies under Independent Confirmation and Validation, ETL is also going through the same phase.
- Requirement Understanding
- Validation
- Verification Estimation is based on a number of tables, the complexity of rules, data volume and performance of a job.
- Verification Planning is based on the inputs from verification estimation and business requirements. We need to identify here as what is in scope and what is out of scope. We will also look out for dependencies, risks and mitigation plans during this phase.
- Designing Verification cases and Verification scenarios from all the available inputs. We also need to design mapping documents and SQL scripts.
- Once all the verification cases are ready and approved, the verification team will proceed to perform pre-execution checks and verification data preparation for testing.
- Lastly, execution is performed until exit criteria are met. So, the execution phase includes running ETL jobs, monitoring job runs, SQL script execution, defect logging, defect retesting and regression testing.
- Upon successful completion, a summary report is prepared and the closure process is done. In this phase, sign off is given to promote the job or code to the next phase.
The first two phases i.e., requirement understanding and validation can be regarded as pre-steps of ETL verification process.
So, the main process can be represented as below:
It is necessary to define a verification strategy which should be mutually accepted by stakeholders before starting actual verification. A well-defined verification strategy will ensure that the correct approach has been followed to meet the verification aspirations.
ETL/Data Warehouse verification might require writing SQL statements extensively by the verification team or maybe customizing the SQL provided by the development team. In any case, a verification team must be aware of the results that they are trying to get using those SQL statements.
Difference Between Database and Data Warehouse Verification
There is a popular misunderstanding that database verification and data warehouses are similar while the fact is that both hold different directions in verification.
- Database verification is done using a smaller scale of data normally with OLTP (Online transaction processing) type of databases while data warehouse verification is done with large volume with data involving OLAP (online analytical processing) databases.
- In database verification, normally data is consistently injected from uniform sources while in data warehouse verification most of the data comes from different kind of data sources which are sequentially inconsistent.
- We generally only perform CRUD (Create, read, update and delete) operations during database verification while in data warehouse verification we use read-only (Select) operation.
- Normalized databases are used in DB verification while denormalized DB is used in data warehouse verification.
There are a number of universal verifications that have to be carried out for any kind of data warehouse verification.
Given below is the list of objects that are treated as essential for validation in this verification:
- Verify that data modification from source to destination works as expected.
- Verify that the expected data is added to the target system.
- Verify that all DB fields and field data are loaded without any truncation.
- Verify data checksum for record count match.
- Verify that for rejected data proper error logs are generated with all the details.
- Verify NULL value fields
- Verify that duplicate data is not loaded.
- Verify data integrity
=> Know the difference between ETL/Data warehouse verification & Database Verification.
ETL Verification Challenges
This verification is quite different from conventional verification. Many challenges are faced while performing data warehouse verification.
Here are a few challenges that I experienced on my project:
- Incompatible and duplicate data
- Loss of data during ETL process.
- Unavailability of the inclusive testbed.
- Verification teams have no privileges to execute ETL jobs on their own.
- The volume and complexity of the data is huge.
- Faulty business processes and procedures.
- Trouble acquiring and building test data
- Unstable verification environment
- Missing business flow information
Data is important for businesses to make critical business decisions. ETL verification plays a significant role in validating and ensuring that the business information is accurate, consistent and reliable. It also minimizes the hazard of data loss in production.
Hope these tips will help you ensure that your ETL process is accurate and the data warehouse built by this is a competitive advantage for your business.
Complete Catalog of ETL Verification Tutorials:
- Tutorial #1: ETL Verification Data Warehouse Testing Introductory guide
- Tutorial #2: ETL Verification Utilizing Informatica PowerCenter Tool
- Tutorial #3: ETL vs. DB Verification
- Tutorial #4: Business Intelligence (BI) Verification: How to Verify Business Data
- Tutorial #5: Top 10 ETL Verification Tools
This is a guest post by Vishal Chhaperia who is working in an MNC in a test management role. He has extensive experience in managing multi-technology QA projects, Processes and teams.
Further Reading =>> Best ETL Verification Automation Tools
Have you worked on ETL verification? Please share your ETL/DW verification tips and challenges below.