Showing posts with label online etl testing. Show all posts
Showing posts with label online etl testing. Show all posts

Thursday, 30 January 2025

What are common ETL testing tools?

Here are some of the most commonly used ETL testing tools :

1. QuerySurge
Designed specifically for ETL testing, QuerySurge automates the testing of data warehouses and big data. It validates data across various stages of the ETL process to ensure accuracy and integrity

2. Talend
Talend provides a comprehensive suite for data integration and ETL testing. It offers built-in data quality checks and a graphical user interface for designing and managing ETL workflows

3. Informatica Data Validation
Informatica's Data Validation tool helps automate data quality assurance and control processes. It supports data comparison, data profiling, and automated test execution

4. RightData
RightData is a self-service ETL and data integration testing tool. It helps automate data quality assurance and control processes, making it easier to validate data transformations and loads

5. Datagaps ETL Validator
Datagaps ETL Validator provides a comprehensive solution for ETL testing with features like data comparison, data profiling, and automated test execution. It ensures data accuracy and consistency throughout the ETL process

6. iceDQ
iceDQ is a DataOps platform for data testing and monitoring. It offers rules-based auditing, real-time reporting, and extensive integration options, making it a robust choice for ETL testing

7. DataTrust
DataTrust by RightData is a no-code data observability platform that detects anomalies, generates business rules, and validates data. It is suitable for both one-time migrations and ongoing data operations 

How DWH ETL testing is different from the Application testing?

Data Warehouse (DWH) ETL testing and application testing serve different purposes and involve distinct processes. Here are the key differences between them:

DWH ETL Testing

  1. Focus: Ensures the accuracy, completeness, and reliability of data as it moves through the ETL (Extract, Transform, Load) process into the data warehouse.
  2. Data Validation: Involves validating data extraction from source systems, data transformation rules, and data loading into the target data warehouse.
  3. Data Quality: Emphasizes data quality checks, including data integrity, consistency, and accuracy.
  4. Performance: Tests the performance of ETL processes to ensure they can handle large volumes of data within acceptable time frames.
  5. Historical Data: Often deals with large volumes of historical data, requiring validation of data aggregation and summarization.
  6. Tools: Uses specialized ETL testing tools like QuerySurge, Talend, and Informatica Data Validation.

Application Testing

  1. Focus: Ensures that software applications function correctly according to specified requirements and user expectations.
  2. Functionality: Involves testing the functionality of application features, user interfaces, and workflows
  3. Usability: Emphasizes user experience, ensuring the application is intuitive and easy to use.
  4. Performance: Tests the application's performance under various conditions, including load, stress, and scalability testing.
  5. Security: Includes security testing to identify vulnerabilities and ensure data protection.
  6. Tools: Uses general application testing tools like Selenium, JUnit, and LoadRunner.

Summary

  • DWH ETL Testing: Focuses on data validation, quality, and performance within the ETL process, ensuring accurate and reliable data in the data warehouse.
  • Application Testing: Focuses on the functionality, usability, performance, and security of software applications, ensuring they meet user requirements and expectations.

Monday, 31 October 2022

What is mean by SCD and Types of SCD?

A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse. It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records.

SCD Type 0 : 

This refers to attributes that never change and will not be updated in the data warehouse. 
for eg. date of birth or an employee's start date with a company

SCD Type 1 : 

History is not maintained and the table always shows the recent data. This effectively means that such dimension table is always updated with recent data whenever there is a change, and because of this update, we lose the previous values

SCD Type 2 : 

A type 2 dimension table tracks the historical changes by creating separate rows in the table with different surrogate keys. 

SCD Type 3 : 

A type 3 dimension stored the history in a separate column instead of separate rows. So unlike a type 2 dimension which is vertically growing, a type 3 dimension is horizontally growing. 

SCD Type 4 : 

Provides a solution to handle the rapid changes in the dimension tables. The concept lies in creating a junk dimension or a small dimension table with all the possible values of the rapid growing attributes of the dimension.

SCD Type 6 : 

It is a hybrid of type 1, 2 and 3 (1+2+3) which acts very similar to type 2, but only you add one extra column to denote which record is the current record.



What is Database?

A database is an organized collection of data, so that it can be easily accessed and managed. The main purpose of the database is to operate a large amount of information by storing, retrieving, and managing data. 

 There are many databases available like MySQL, Sybase, Oracle, MongoDB, PostgreSQL, SQL Server, AWS (Amazon Web Services), Snowflake Computing, etc. 

 SQL is a standard language for storing, manipulating, and retrieving data in databases.

How to perform regression testing in ETL?

Regression testing in ETL (Extract, Transform, Load) ensures that changes or updates to the ETL process do not negatively impact existing fu...