CUED Publications database

Automatic discovery of the statistical types of variables in a dataset

Valera, I and Ghahramani, Z (2017) Automatic discovery of the statistical types of variables in a dataset. In: UNSPECIFIED pp. 5380-5388..

Full text not available from this repository.

Abstract

A common practice in statistics and machine learning is to assume that the statistical data types (e.g., ordinal, categorical or real-valued) of variables, and usually also the likelihood model, is known. However, as the availability of real-world data increases, this assumption becomes too restrictive. Data are often heterogeneous, complex, and improperly or incompletely documented. Surprisingly, despite their practical importance, there is still a lack of tools to automatically discover the statistical types of, as well as appropriate likelihood (noise) models for, the variables in a dataset. In this paper, we fill this gap by proposing a Bayesian method, which accurately discovers the statistical data types in both synthetic and real data.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Subjects: UNSPECIFIED
Divisions: Div F > Computational and Biological Learning
Depositing User: Cron Job
Date Deposited: 10 Jul 2018 01:36
Last Modified: 13 Apr 2021 09:32
DOI: