Data warehousing Testing appoach

Data Migration Testing Approach





1. Objective 3

2. Introduction 3

3. Data Migration 3

4. Different types of Data Migration 3

4.1 Data Migration types 3

4.2 Road Map technologies 4

4.3 Data Volumes 4

5. Dependencies 4

5.1 Environmental Dependencies 4

5.2 Support Dependencies 5

5.3 Data Dependencies 5

6. SDLC Phases followed 6

6.1 Business Requirements Gathering 6

6.1.1 Risks in Data warehouse 6

6.2 Analysis & Design 6

6.2.1 Gap Analysis 7

6.2.2 Data mapping 7

6.2.3 Translation Tables 7

6.2.4 Specifications 7

6.2.5 Iterations 8

6.2.6 Data Cleanse 8

6.2.7 Cleansing Strategyneed to write an overview about this 8

6.3 Testing POC 8

6.3.1 Test Strategizing 9

6.3.2 Test Planning (Test approach) 9

6.3.3 Test Environment Setup 9

6.3.4 Test Execution 9

6.3.5 Result and Defect Analysis 9

6.3.6 Manual Testing Techniques 9

6.3.7 Reconciliation 13

6.3.8 File Integrities 13

6.3.9 Translation Table Validation 13

6.3.10 Design of Automated Test Scripts 13

6.4 Implementation 15

6.5 Maintenance 16

7. Summary 16

8. Conclusion 16





1. Objective

The main objective of this paper is to identify the bottlenecks and process to resolve the issues in data warehouse and Data Migration testing. This approach and process provides a defect free product meeting excellent quality within a given timeframe.

2. Introduction

In recent years, Organizational Data in its database (like Teradata, Oracle, DB2 etc.) drastically increased to peta data level. Possible issues include Data redundancy, data constraints, Data Integrity and data inconsistency.

3. Data Migration

Data migration is a process which involves the migration of data from an existing database to a new database (i.e. moving or transforming the data from legacy source database system to target database systems). This includes one to one, one to many mapping and movement of static & transactional data. Migration also relates to the physical extraction and transmission of data between legacy and target hardware platforms.



Data migration activity typically include everything that is required with respect to data to ensure that the new database is up and running without any issues and it also contains all the legacy data ( data that is present in the existing database) , and the data has been migrated to correct table(s) and column(s).



4. Different types of Data Migration

4.1 Data Migration types

• Enhancement of the tables and columns within the database.

• Creation/Updation of new tables and columns in new database.

• Migration of data from one database to another

o Direct Migration from the source database to target database. In this type of migration, data elements that are present in the source database is directly transferred into the target database. e.g.:- legacy data to the new database, file system to database system, technology change, version upgrade etc.

o Data Consolidation is consolidating different data bases from multiple sources (eg: Oracle, Flat files, SQL Server, DB2 to Oracle) in the same organization for building target database as a single target database system. It helps in achieving Enterprise level consolidation which reduces the cost of the hardware, software, increases the security and last but not least reducing maintenance cost drastically.



o Migration after performing certain transformations (Sorting, Calculations, Stored procedures, Filter etc.) on the source data before moving to target database. For e.g., a field ‘Priority’ in the source database may allow only one out of the three values High, Medium and Low. But in the target database the same field allows one out of the three values 1, 2 and 3. Thus the High value in the source database will be transformed to a value 1 and will then be migrated to the target database.

4.2 Road Map technologies



Organization may decide to upgrade or change the Hardware/Software (eg. Database, application etc.) for many reasons. Reasons include improving performance, new features, working efficiency etc.

Upgrade includes

• Operating System changes - installing monitor software applications

• Performance: Increase the size of the memory or Hard disk or Processor speed

• Network connectivity changes

• Database upgrade: It requires existing data to be moved to a new (target) database or to change the architecture.

4.3 Data Volumes



Size of the database increase tremendously with the growth/expansion of the company. Collected data needs to be tested/ verified by the database owner to reduce data redundancy, data inconsistency, Data accuracy and to increase data efficiency.

In huge volume of database testing, we prefer test sampling data of source and target. Excel can be used to compare Source and target values (expected and actual values) and counting the number of records between source and target.

5. Dependencies



Dependencies are to be concentrated first to reduce the risk of the project. The common dependencies are as follows

• Environmental Dependencies

• Support dependencies

• Data Dependencies



5.1 Environmental Dependencies



While project is executed on onshore-offshore model then we need to concentrate on connectivity. For example, security reasons offshore will be working on the remote system and even some times on-site team also will be have limited access permissions.

Development and Testing Infrastructure - hardware, software, databases, applications and desktop configuration

5.2 Support Dependencies

Support activity involves integrating the hardware operations into the IT department operation, and will include procedures for startup and shutdown of data warehousing hardware, maintenance support, and identification of hardware problems such as the failure of a disk drive, a power supply, or other hardware component.

• Training (legacy & target applications) - particularly for an in-house team

• Business Analysts -provide expert knowledge on both legacy and target systems for Developers, testers and Business Users.

• Operations - Hardware / Software / Database Analysts - facilitate system housekeeping when necessary

• User Acceptance Testers - business users

• Business Support for data cleansing

5.3 Data Dependencies

Atomic data dependency, relationships in E-R models without fact tables.



• Translation Tables - translates legacy parameters to target parameters

• Static Data / Parameters / Seed Data (target parameters)

• Business Rules - migration selection criteria (e.g. number of months history)

• Entity Relationship Diagrams / Transfer Dataset / Data Dictionary

/Schemas (legacy & target)

• Sign Off / User Acceptance criteria - within agreed tolerance limits



Figure: Enterprise level Architecture

Note: Because most of the time every one is concentrated on there work to be complete but they never ever try to know who is owner and location of the database.

6. SDLC Phases followed

6.1 Business Requirements Gathering



In this phase, collecting business requirements from the client by the business analyst should be reviewed by the subject matter expert to make it sure the requirement is clear, achievable and understandable. Final phase of SDLC should not impact the delivery of the product due to issues in Requirements phase.

6.1.1 Risks in Data warehouse

Risk is inherent in any project but the risks involved in a data warehouse seem to be greater than in others.

1. Data mapping and ER document has different transformation rules and logic and the mapping of attributes in the source and target database are frozen. This process should be complete before QA teams start its script development approach

2. The QA team should be provided with the latest requirements (mapping documents etc.) otherwise leads to major rework in the approach, SQL queries and automating scripts.

3. There is a possibility of QA team not understanding the transformation rules or business rules correctly eg. Functional transformations.

4. The QA team must understand the relationship between attributes correctly else the SQL queries and automating programs will not give expected results and may eventually result in improper defects reporting.

6.2 Analysis & Design

This Phase involves understanding the requirements of data migration which include estimating the size of the database and its complexity and identifying the different types of migration that are required.



The complexity of a database in the present context refers to

1. The size of the source database (also called the existing database). However for testing purpose the size of the database may not be equal to the physical size of the database. This is so because it is not necessary that all the data that is present in the source database is migrated to the target database (new database), data migration may itself perform some data cleanup on the source database and thus will not migrate corrupt data or irrelevant data to the target database.

2. Estimating the number of fields (columns) and the tables that are included in the scope of data migration.

3. Identifying the fields (table names and fieldnames i.e. column names)

4. Identifying the field that has the most complicated migration logic. The requirements for data migration generally come in form of spreadsheet(s) that contain a field level mapping between the source database and the target database.



Data mapping explains the destination (of a field from source database) in the target database. Mapping could be ‘one -to one’, or ‘many-to one’ or like One to many. In ‘one-to-one’ mapping all the data from say sourcedb.table1.field1 is migrated to targetdb.table1.field1. In ‘many-to-one’ mapping data from more than one field in the source database is migrated to only one field in target database.

6.2.1 Gap Analysis

Liaison with the business is vital in this phase as mission critical data cannot be allowed to be left behind, it is usual to consult with the relevant business process leader or Subject Matter Expert (SME). Often it is the case that this process ends up as a compromise between pulling the necessary data out of the legacy system to meet the new systems functionality pushing certain data into the new system from legacy to enable the continuity of certain ad hoc or custom in-house processes to continue.

6.2.2 Data mapping

This is the process of mapping data from the legacy to target database schemas taking into account any reformatting needed. This would normally include the derivation of translation tables used to transform parametric data. It may be the case at this point that the seed data, or static data, for the new system needs generating and here again tight integration and consultation with the business is a must.

6.2.3 Translation Tables

Mapping Legacy Parameters to Target Parameters

6.2.4 Specifications

These designs are produced to enable the developer to create the Extract, Transform and Load (ETL) modules. (Is this ETL modules or any transformations?) The output from the gap analysis and data mapping are used to drive the design process. Any constraints imposed by platforms, operating systems, programming languages, timescales etc should be referenced at this stage, as should any dependencies that this module will have on other such modules in the migration as a whole failure to do this may result in the specifications being flawed.



There are generally two forms of migration specification: Functional (e.g. migration strategy) Detailed Design (e.g. data mapping document)

Built into the migration process at the specification level are steps to reconcile the migrated data at predetermined points during the migration. These checks verify that no data has been lost or gained during each step of an iteration and enable any anomalies to be spotted early and their cause ascertained with minimal loss of time.



Usually written independently from the migration, the specifications for the reconciliation programs used to validate the end-to-end migration process are designed once the target data has been mapped and is more or less static. These routines count like-for-like entities on the legacy system and target system and ensure that the correct volumes of data from legacy have migrated successfully to the target and thus build business confidence.

6.2.5 Iterations

These are the execution of the migration process, which may or may not include new cuts of legacy data.

The refinement of the migration code i.e. increases data volume and decrease exceptions through:

• Continual identification of data cleanse issues

• Confirmation of parameter settings and parameter translations

• Identification of any migration merge issues

• Reconciliation

From our experience the majority of the data will conform to the migration rules and as such take a minimal effort to migrate ("80/20 rule"). The remaining data, however, is often highly complex with many anomalies and deviations and so will take up the majority of the development time.

6.2.6 Data Cleanse

This activity is required to ensure that legacy system data conforms to the rules of data migration. The activities include manual or automatic updates to legacy data. This is an ongoing activity, as while the legacy systems are active there is the potential to reintroduce data cleanse issues.

6.2.7 Cleansing Strategyneed to write an overview about this

• Legacy - Pre Migration

• During migration (not advised as this makes reconciliation very difficult)

• Target - Post Migration (either manual or via data fix)

• Ad Hoc Reporting - Ongoing



6.3 Testing POC

Testing POC refers to ‘Proof of Concept for Testing phase’. This will include development of atleast one test approach and test script for all types of migration that are in testing scope. Eg. For fields that have to undergo automated testing, Scripts are written which bring data of the field from the source database and the data from the corresponding field of the target database into one and compares between the data fetched from the two databases.

If this POC is found to be working satisfactorily and is also approved by the customer then this becomes the model approach based on which test approaches and test scripts are developed for all the fields to be validated.

6.3.1 Test Strategizing

It defines the Process of testing, the Scope of testing, estimated effort and also the Timelines of the testing process. The strategy document is created based on the requirement of data migration. The document is submitted for customer signoff.

6.3.2 Test Planning (Test approach)

The test planning phase deals with the creation of a test approach for each field. This includes developing approach for both automated testing and manual testing at field level.

The different activities include

• Activities during planning of data migration testing.

• Understanding the Requirements.

• Understanding the databases involved.

• Estimating the size of the data that is under the scope of data migration.

• Identifying the 'Entity Relationship' in the existing and the new databases.

6.3.3 Test Environment Setup

Test environments have to be setup before the test execution phase. Typically this activity is not performed by the QA team. The responsibility of the QA team in this phase is to provide the specifications required for the testing process. Specifications includes details like access rights to be provided on Source and Target databases

6.3.4 Test Execution

Test execution step involves execution of the programs for automated test scripts or execution of test scripts in the case of manual tests. Test approaches are sent to the reviewers before the start of test execution. Typically these reviewers are personnel from client side, and their job is to review the test approach and intimate their comments to the QA team. These review comments (if any), are then incorporated into the test approach and the programs for automated scripts are updated accordingly. During test execution all the test scripts will be executed and their status will be updated.

6.3.5 Result and Defect Analysis

All test scripts that fail during the test execution phase result in defects. Thus each defect that is reported during the test execution phase is associated with one or more test scripts. This activity ensures that all defects are properly tracked at all times during test execution. This also identifies the test scripts that are candidates of regression testing in subsequent cycles of test execution, if the cycle does not involve full regression. Defect analysis results in a test execution summary report. The objective of this report is to indicate a trend in defect reporting exercise.

6.3.6 Manual Testing Techniques

While in manual testing of DWH, we need to start with the counting number of records in the source and target and compare the results.

In cases where the data is set up for the first time from spreadsheets, a manual comparison would be a better option.

The limitation would be the number of records that cannot be compared manually, With the increase in number of records to be compared, the chances of human errors are more. For the data that is being setup from spread sheets, the validation can be done manually. The manual approach is preferred because the data is already available in spreadsheets and the data from the database can be imported to another spreadsheet and comparison can be done. To do the comparison some of the functions (EXACT) available in excel can be used. The drawback in this method would be the limit on number of records that can be imported to Spreadsheets at a time. The records from the database need to be sorted in the same order as in the source spreadsheets, before comparison can be done.



Create the test cases and SQL queries based on the below phases.

Testing can be divided into three phases

Extraction Testing Phase

This testing checks the following:



• Staging data validation

• Extract the data from the Sources (different file formats of data and different data sources)

a. Validation of different formats of the data sources

b. Extract the required data from the source to target

Note: Sources means different types of formats (Text file, Excel file, Mainframe format file, Oracle, DB2, Etc….)

• Data is able to extract the required fields.

• The Extraction logic for each source system is working

• Extraction scripts are granted security access to the source systems.

• Updating of extract audit log and time stamping is happening.

• Source to Extraction destination is working in terms of completeness and accuracy.

• Extraction is getting completed with in the expected window.



Transformation Testing Phase

• Transaction scripts are transforming the data as per the expected logic.

• The one time Transformation for historical snap-shots are working.

• Detailed and aggregated data sets are created and are matching.

• Transaction Audit Log and time stamping is happening.

• There is no pilferage of data during Transformation process.

• Transformation is getting completed with in the given window

• Validate the Business Rules

• Clean the invalid records, rejected and error records

• Check for data integrity

• Entity Integrity

a. Referential Integrity

b. Domain Integrity

 Check for data Consistency

 Validity, Accuracy, Usability and Integrity of related data between applications

 Check for data Redundancy

 Duplicate Records

 Repeated data records

 Check for Calculation

SQL Queries to be used

• Select statement for specific columns

• Logical transaction codes

• Business rules and conditions

• Deriving the calculations

• Filtering, grouping and sorting

• Joining the more than one table

• Extract the data with standard format



Load Testing Phase

• There is no pilferage during the Loading process.

• Any Transformations during Load process is working.

• Data sets in staging to Load destination is working.

• One time historical snap-shots are working.

• Both incremental and total refresh are working.

• Loading is happening with in the expected window.

• Conditional Verifications

Validation of timely Batch jobs

• One time batch run

• Weekly batch run

• Monthly batch run

• Whenever needed, we need to run the batch based on the work

Validation of data

• Data type

• Size of the data

• Data values (Test Data)



6.3.7 Different types of testing:

End User Browsing and OLAP Testing

• The Business views and dashboard are displaying the data as expected.

• The scheduled reports are accurate and complete.

• The scheduled reports and other batch operations like view refresh etc. is happening in the expected window.

• 'Analysis Functions' and 'Data Analysis' are working.

• There is no pilferage of data between the source systems and the views.

Ad-hoc Query Testing

• Ad-hoc queries creation is as per the expected functionalities.

• Ad-hoc queries output response time is as expected.

Down Stream Flow Testing

• Data is extracted from the data warehouse and updated in the down-stream systems/data marts.

• There is no pilferage.

One Time Population testing

• The one time ETL for the production data is working

• The production reports and the data warehouse reports are matching

• T he time taken for one time processing will be manageable within the conversion weekend.

End-to-End Integrated Testing

• End to end data flow from the source system to the down stream system is complete and accurate.

Stress and volume Testing - Testing will involve, placing maximum volume OR failure points to check the robustness and capacity of the system. The level of stress testing depends upon the configuration of the test environment and the level of capacity planning done.

• Server shutdown during batch process.

• Extraction, Transformation and Loading with two to three times of maximum possible imagined data (for which the capacity is planned)

• Having 2 to 3 times more users placing large numbers of ad-hoc queries.

• Running large number of scheduled reports.

Parallel Testing - Parallel testing is done where the Data Warehouse is run on the production data as it would have done in real life and its outputs are compared with the existing set of reports to ensure that they are in synch OR have the explained mismatches.

Regression Testing - testing known problem areas

Spot Testing - a random spot check on migrated data

Scenario Testing -Legacy to target system verification that data has been migrated correctly for certain customers chosen by the business who's circumstances fall into categories (e.g. inclusion and exclusion Business Rule categories, data volumes etc.)

User Acceptance Testing - Customer based User Acceptance Test of the migrated data which will form part of the Customer Signoff



6.3.7 Reconciliation

Horizontal reconciliation (number on legacy = number on interim = number on target) and Vertical reconciliation (categorization counts (i.e. Address counts by region = total addresses) and across systems). Check list to be maintained in all stages (legacy, interim, and target).

6.3.8 File Integrities

Scripts that identify and report the following for each table:

• Referential Integrity - check values against target master and parameter files.

• Data Constraints

• Duplicate Data

6.3.9 Translation Table Validation

Run after new cut of data or new version of translation tables, two stages:

• Verifies that all legacy data accounted for in "From" translation

• Verifies that all "To" translations exist in target parameter data

6.3.10 Design of Automated Test Scripts

This step is carried out in parallel with the test planning. As the approach for a field that would undergo automated testing is developed and is reviewed internally. The QA team writes a program for the field that brings data from two different database onto a single platform (in this case that will be the program itself) and the program then performs a comparison at data unit level between the data fetched from the two databases. One program can actually perform automated testing for more than one field if the fields are from the same table and are undergoing similar type of migration.

Test Automation

When testing large databases, manual testing of the migration may prove to be in-efficient. Sampling of data can be done and a manual verification can be performed on this sample data, but this can lead to in-sufficient testing as a larger part of the database may be left untested. Thus automated tests should be performed to ensure that an error free migration is performed. Automation involves using scripts which will do the job of comparing the data in the source and the target databases. The candidates for automation would be those fields where the number of records involved are large and also those where there is functional transformation taking place when the data is being migrated from the source to the target database. Automation can be done by writing test scripts (i.e. programs) that compare the data present in the two databases. The script may be written in any programming language that is comfortable to the tester and which has the functionality to access both the source and target databases. The automated script merely has to retrieve the data present in the two databases and compare the values/records of each column/table of the source and target database with each other. The scripts will contain SQL queries that will retrieve the data present in the database. The complexity of these queries will depend on the database and also on the type of migration being performed. Data retrieved from the source and target databases should be sorted in the same manner. After the script retrieves the data, the script should perform a one to one comparison between the data elements retrieved from the source and target databases. The details of the comparison should be written into a file thus allowing the tester to verify if any errors were encountered during the comparison.



Note: Identifying the tests to be automated and the tests that can be automated.



Traditional Vs Typical approach



AS-IS TO-BE(KISS Model)

Requirements Gathering BRS BRS

Design Docs Universe Design Document, Detailed Technical Documents, Mapping Document Universe Design Document, Detailed Technical Documents, Mapping Document, ER-Diagram

Testing 1.Test Scenarios is considered as one test case

2.In all stages data validation is done Break down is needed for each test cases to reduce the complexity of test case execution

1. Each test condition to be validated

2. Test scenario divided into based on the test conditions

3. Each test case can be further divided into the test steps

Testing process 1.Validating the Source, Lookup, Target and error tables

2.Validating the mapping tables

3.Data validation between source and target 1. Validating the source, by counting the number of records

2. It is not needed to validate completely on each record (where we don’t have the requirements clear)

3. Need to make sure to validate the target (validation based on requirements on the specific point)

4. Sharing of workload will be easy ie to execute the test cases.

5. Validation of the test results will be easy

6. Effort calculation, test metrics easy to maintain.

SQL Huge SQL Query needs to be executed 1.Write a SQL query for each column

2. Count the number of records for each table.

3. Validate the each column in the table

4. Validate the business rules

5. Validate the data values

6. Check the data integrity, data consistency, data redundancy

Sharing the workload Single person should be assign to work on the specific query It is easy to share the workload within the team

Debugging While debugging the SQL query, it will be taken too much time to identify the issue Simple queries debugging is very easy

Technical Knowledge Mandatory need the SQL Expertise to write or to execute the Query Basic knowledge is sufficient to create or to execute the Query.

Dependency Resource dependency will be there! There is no dependency of the resource to work

Execution Time If the SQL query is ready to execute, then time taken is very less

Note: Test case updates (SQL Debugging may take more time) If the SQL query is ready to execute, then time taken is very less.

Note: Create and update the SQL query

Test result Comparison Needs more effort to compare the test results (Expected = Actual) Needs less effort to compare the test results (Expected = Actual)

Enhancement/Modification of the requirement Need huge effort to update the SQL query Needs less effort to update the SQL query



6.4 Implementation

Build Team or Deployment Team

• Produce the deployment documentation (e.g., deployment plan, installation manual, and release notes).

• Deploy the application, database, hardware components, and software components to their production environments.

Post Implementation

For an agreed period after implementation certain key members of the migration team will be available to the business to support them in the first stages of using the new system. Typically this will involve analysis of any irregularities that may have arisen through dirty data or otherwise and where necessary writing data fixes for them.

Post Implementation fixes

Post Implementation Data Fixes are programs that are executed post migration to fix data that was either migrated in an 'unclean' state or migrated with known errors. These will typically take the form of SQL scripts.

6.5 Maintenance

Will be done by the support team if there is a need to have a support from the design team to make sure the requirements is correct.

7. Summary

Reduces the complexity and increase the testing efficiency

• Time

• Effort

• Maintenance

• Easy to execution

• Increases frequency of test cycle execution

8. Conclusion

KISS model can be used in database testing or data warehouse testing and data reporting testing to get defect free quality product with efficient way of testing and reducing the complexity of the testing. This approach is simple to implement and makes testing job easy and very efficient.

Comments

Popular posts from this blog

Terminal Emulation – QTP Automation Testing

HP QC SQL Queries

Testing Topics