7 Data Types: A Better Way to Think about Data Types for Machine Learning (2024)

In this article I propose a more useful taxonomy of data groupings for machine learning practitioners: the 7 Data Types.

7 Data Types: A Better Way to Think about Data Types for Machine Learning (3)

Online courses, tutorials, and articles on encoding, imputing, and feature engineering for machine learning generally treat data as either categorical or numeric. Binary and time series data sometimes get called out and, once in a while, the term ordinal sneaks into the conversation. However, a more refined framework is needed to provide a richer common lexicon for thinking and communicating about data in machine learning.

A framework along the lines of the one I propose in this article should lead practitioners, especially newer practitioners, to develop better models faster. With 7 Data Types to reference we should all be able to more quickly evaluate and discuss the encoding options and imputation strategies available.

Think and talk about each of your features as one of the following seven data types to save time and transfer knowledge:

  1. Useless
  2. Nominal
  3. Binary
  4. Ordinal
  5. Count
  6. Time
  7. Interval

UPDATE

Read all the way through to see the additional 4 data types for machine learning.

In the machine learning world, data is nearly always split into two groups: numerical and categorical.

Numerical data is used to mean anything represented by numbers (floating point or integer). Categorical data generally means everything else and in particular discrete labeled groups are often called out. These two primary groupings — numerical and categorical — are used inconsistently and don’t provide much direction as to how the data should be manipulated.

Data generally needs to be put into numeric form for machine learning algorithms to use the data to make predictions. In machine learning guides categorical string data is usually one-hot-encoded (aka dummy encoded). Dan Becker…

7 Data Types: A Better Way to Think about Data Types for Machine Learning (2024)
Top Articles
Latest Posts
Article information

Author: Saturnina Altenwerth DVM

Last Updated:

Views: 6164

Rating: 4.3 / 5 (44 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Saturnina Altenwerth DVM

Birthday: 1992-08-21

Address: Apt. 237 662 Haag Mills, East Verenaport, MO 57071-5493

Phone: +331850833384

Job: District Real-Estate Architect

Hobby: Skateboarding, Taxidermy, Air sports, Painting, Knife making, Letterboxing, Inline skating

Introduction: My name is Saturnina Altenwerth DVM, I am a witty, perfect, combative, beautiful, determined, fancy, determined person who loves writing and wants to share my knowledge and understanding with you.