Basics of statistics for data science: Type of data measurement scale
Before understanding the type of data, let us try to understand the properties of measurement. These measurement properties are identity, magnitude, equal intervals and a minimum value of zero. Each of the variable like gender, age, height, weight, income, customer satisfaction etc will have some of these measurement properties.
Each value on the measurement scale has a unique meaning
Values on the measurement scale have an ordered relationship to one another. That is, some values are larger and some are smaller
Scale units along the scale are equal to one another. This means, for example, that the difference between 1 and 2 would be equal to the difference between 19 and 20
A minimum value of zero.
The scale has a true zero point, below which no values exist
Variable type based on data
Based on the above measurement properties, any variable will be categorized into nominal, ordinal, interval and ratio scale. Generally speaking, you will have categorical variable and continuous variable.
In the below table, you can make distinction between different variable types based on the measurement properties mentioned above.
|Identity||Magnitude||Equal Intervals||A minimum value of zero||Example|
|Nominal||Yes||Male / Female|
Now, nominal variable will have only identity. The example of this is gender variable. It takes possible values of male / female. There is no order between these values i.e male is not greater than female or female is not greater than male.
Ordinal variable has identity as well as carries magnitude. For example let us say the speed of the train is low, medium and high in that case we can say that there is an order in the relationship. Low < medium < high
Interval variable has identity, magnitude and equal distance as property but it doesn’t take a minimum value of zero. For example, Economic growth can be 0, positive growth and negative growth hence it is interval variable
Ratio variable has identity, magnitude and equal distance as property and it does take a minimum value of zero. For example income can be 0 to some positive value however it doesn’t take a negative value.
Nominal variable and ordinal variable are called as categorical variables in statistical softwares.
The distinguishing characteristics between the nominal variable and the ordinal variable is the order / levels between the ordinal variable. In R and python both of these variables are called as factor variables.
Interval and Ratio variables are considered as continuous variables.
There are various statistical packages like SPSS, SAS, R or Python, they do not distinguish between Interval variable and ratio variable as approach to analyses do not differ for these two kinds of variables. They are also called as continuous variables or numeric variables.
I hope you enjoyed reading this blog and clarifies your concept related to type of data / variable.
The next step after identifying the type of variable in the datasets is to analyze the data.
Learn how to perform Descriptive analysis of the data by following the steps given in the link below.