CUED Publications database

The tasks of pre and post-processing in Data Mining applied to a real world problem

Díaz, JL and Herrera, M and Izquierdo, J and Pérez-García, R (2010) The tasks of pre and post-processing in Data Mining applied to a real world problem. In: UNSPECIFIED pp. 1980-1989..

Full text not available from this repository.


Pre and post-processing are crucial tasks in Knowledge Discovery in Databases (KDD). In this contribution we present an application to a data set from a real water supply network (WSN) in the town of Calarcá (Colombia), located in the so-called "Eje Cafetero" coffee region. We use traditional and well-known techniques of pre and post-processing with the aim of showing its importance in Data Mining (DM), and of enhancing the need of results interpretability when dealing with real data set. Pre and post-processing tools, as well as other DM tasks implemented in Clementine 9.0 (SPSS), have been used. Clementine 9.0 has a number of pre and post-processing tools to work with records (rows) and fields (columns) in a database. Basically, we used selection and deriving operations for records, and type and filter operations for fields. The database consists of a record of requests, complains and claims (PQRs in Spanish), for the year 2006, remitted to the Calarcá Water Supply Company Multipropósito, S.A. ESP. Additionally, the database is also integrated by the network hydraulic model, some climatic variables, and thematic maps of vulnerabilities and risk areas for natural phenomena. The PQRs information consists of 846 records. First, the consistency of the PQRs was evaluated to determine outliers, and lost or missing information. Next, each point was located on the map of the town and its UTM coordinates were obtained. Then, each PQR was associated to its nearest pipe and node of the primary network. The graphical classification of variables shows trends that permit us to obtain a priori conclusions in KDD. These data were used to feed the model and to obtain relationships between different variables and the damage type on the network well within the post-processing task.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Divisions: Div E > Manufacturing Systems
Depositing User: Cron Job
Date Deposited: 18 May 2020 20:02
Last Modified: 19 Nov 2020 10:44