CUED Publications database

A framework for detecting unnecessary industrial data in ETL processes

Woodall, P and Jess, T and Harrison, M and McFarlane, D and Shah, A and Krechel, W and Nicks, E (2014) A framework for detecting unnecessary industrial data in ETL processes. In: UNSPECIFIED pp. 472-476..

Full text not available from this repository.


© 2014 IEEE. Extract transform and load (ETL) is a critical process used by industrial organisations to shift data from one database to another, such as from an operational system to a data warehouse. With the increasing amount of data stored by industrial organisations, some ETL processes can take in excess of 12 hours to complete; this can leave decision makers stranded while they wait for the data needed to support their decisions. After designing the ETL processes, inevitably data requirements can change, and much of the data that goes through the ETL process may not ever be used or needed. This paper therefore proposes a framework for dynamically detecting and predicting unnecessary data and preventing it from slowing down ETL processes - either by removing it entirely or deprioritizing it. Other advantages of the framework include being able to prioritise data cleansing tasks and determining what data should be processed first and placed into fast access memory. We show existing example algorithms that can be used for each component of the framework, and present some initial testing results as part of our research to determine whether the framework can help to reduce ETL time.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Divisions: Div E > Manufacturing Systems
Depositing User: Cron Job
Date Deposited: 17 Jul 2017 19:41
Last Modified: 19 Jul 2018 07:11