Monday, 31 October 2022
What is mean by SCD and Types of SCD?
What is a junk dimension?
A Junk Dimension is a dimension table consisting of attributes that does not belong in the fact table or in any of the existing dimension tables.
The nature of these attributes is usually text or various flags.
for e.g. non-generic comments or just simple yes/no or true/false indicators.
ETL Test Scenarios
The column name, data type and data length of the target table will be verified against the requirement
constraint’s check. We ensure that all required constraints are available.
Index check :
We ensure that the index are created with the required columns.
Source data validation :
Record the source table count and ensure that there won’t be any junk or bad data exits.
Data count check :
Comparing the target data count against the source data count along with business logic.
Ensure that source data moved correctly to the target table by comparing data.
Ensure that the duplicate records are not present in source and target table.
Data with primary key and foreign key check :
Null check :
Inject the data with NULL for a NOT NULL column and verify that data will be rejected.
Data precision check :
Create test data in the source table with different precisions and ensure the loaded data has precision as per requirement.
Date format check :
All date columns are loaded in the defined date format or not.
ETL Testing Lifecycle/Phases
Requirement analysis :
Defect retesting :
Types of load in ETL?
Full load (bulk load)
The data loading process when we do it the very first time. It can be referred to as bulk load or fresh load.
Incremental load
The modified data alone will be updated in target followed by a full load. The changes will be captured by comparing created or modified date against the last run date of the job.
What is mean by Dimensions?
A Dimension is a descriptive data which describes the key.
The Dimensions are organized in a table called Dimension tables.
Dimension tables are De-normalized.
There is one-to-many relationship between dimensions
Each dimension is represented as a single table. The primary key in each dimension table is related to a foreign key in the fact table.
for eg. Customer, Territory, Employee, Product etc.
What is mean by Dimensional Data Modeling?
Dimensional data model is most often used in data warehousing systems.
In Datawarehouse, a schema is used to define the way to organize the system with all the database entities (fact and dimension tables) and their logical association.
In designing data models for data warehouses / data marts, the most commonly used schema types are as below :
Star Schema :
A fact table in the center surrounded by multiple dimension tables.
Snowflake Schema :
A fact table in the center surrounded by multiple hierarchies of dimension tables.
Galaxy Shema / Fact Constellation Schema :
Multiple fact tables share the same dimension tables. The arrangement of fact tables and dimension tables looks like a collection of stars in the Galaxy schema model.
Star Cluster Schema :
Combination of Star and Snowflake schema.
Whether one uses a star or a snowflake largely depends on business needs.
What are the types of Data Model?
Conceptual Data Model:
This Data Model defines WHAT the system contains. This model is typically created by Business stakeholders and Data Architects. The purpose is to organize, scope and define business concepts and rules.
Logical Data Model:
Defines HOW the system should be implemented regardless of the DBMS. This model is typically created by Data Architects and Business Analysts. The purpose is to developed technical map of rules and data structures.
Physical Data Model:
This Data Model describes HOW the system will be implemented using a specific DBMS system. This model is typically created by DBA and developers. The purpose is actual implementation of the database.
Why Data Model?
Ensures that all data objects required by the database are accurately represented.
A data model helps design the database at the conceptual, physical and logical levels.
It helps to define the relational tables, primary and foreign keys and stored procedures.
It provides a clear picture of the base data and can be used by database developers to create a physical database.
What is Data Model?
The Data Model is defined as an abstract model that organizes data description, data semantics, and consistency constraints of data.
The data model emphasizes on what data is needed and how it should be organized instead of what operations will be performed on data.
What is ETL testing?
ETL testing is done to ensure that the data that has been loaded from a source to the destination after business transformation is accurate. It also involves the verification of data at various middle stages that are being used between source and destination.
Testing can also help identify and prevent issues with data quality during the ETL process, such as duplicate data or data loss.
What is a Data Warehouse?
A Data Warehouse (DW) is a relational database that is designed for query and analysis rather than transaction processing.
It includes historical data derived from transaction data from single and multiple sources.
A Data Warehouse provides integrated, enterprise-wide, historical data and focuses on providing support for decision-makers for data modeling and analysis.
The key features of Datawarehouse are :
- Subject-oriented
- Integrated
- Time variant
- Non-volatile
Types of Database
What is Database?
A database is an organized collection of data, so that it can be easily accessed and managed. The main purpose of the database is to operate a large amount of information by storing, retrieving, and managing data.
There are many databases available like MySQL, Sybase, Oracle, MongoDB, PostgreSQL, SQL Server, AWS (Amazon Web Services), Snowflake Computing, etc.
SQL is a standard language for storing, manipulating, and retrieving data in databases.
How to perform regression testing in ETL?
Regression testing in ETL (Extract, Transform, Load) ensures that changes or updates to the ETL process do not negatively impact existing fu...
-
Incremental data loading is a technique used in ETL (Extract, Transform, Load) processes to load only the new or changed data since the last...
-
Regression testing in ETL (Extract, Transform, Load) ensures that changes or updates to the ETL process do not negatively impact existing fu...
-
Dimensional modeling is a design technique used in data warehousing to structure data for efficient querying and reporting. It organizes d...
