Valera, I and Ghahramani, Z (2017) Automatic Discovery of the Statistical Types of Variables in a Dataset. In: ICML 2017, 2017-8-6 to 2017-8-11, International Conference Centre, Sydney Australia pp. 3521-3529..
Full text not available from this repository.Abstract
A common practice in statistics and machine learning is to assume that the statistical data types (e.g., ordinal, categorical or real-valued) of variables, and usually also the likelihood model, is known. However, as the availability of real- world data increases, this assumption becomes too restrictive. Data are often heterogeneous, complex, and improperly or incompletely documented. Surprisingly, despite their practical importance, there is still a lack of tools to automatically discover the statistical types of, as well as appropriate likelihood (noise) models for, the variables in a dataset. In this paper, we fill this gap by proposing a Bayesian method, which accurately discovers the statistical data types in both synthetic and real data.
Item Type: | Conference or Workshop Item (UNSPECIFIED) |
---|---|
Subjects: | UNSPECIFIED |
Divisions: | Div F > Computational and Biological Learning |
Depositing User: | Cron Job |
Date Deposited: | 17 Jul 2017 19:54 |
Last Modified: | 18 Feb 2021 16:42 |
DOI: |