Everything You Should Know About Dtype in Pandas

DataScientyst
3 min readSep 1, 2021

--

To check the dtypes of single or multiple columns in Pandas you can use:

Let’s see other useful ways to check the dtypes in Pandas.

Step 1: Create sample DataFrame

To start, let’s say that you have the date from earthquakes:

Data is available from Kaggle: Significant Earthquakes, 1965–2016.

How to read and convert Kaggle data to Pandas DataFrame: How to Search and Download Kaggle Dataset to Pandas DataFrame

Step 2: Get dtypes for all columns in DataFrame

To get dtypes details for the whole DataFrame you can use attribute — dtypes:

the result is:

Date datetime64[ns, UTC] Time object Depth float64 Magnitude Type object Type object Magnitude float64 Depth_int int64 dtype: object

we can see several different types like:

  • datetime64[ns, UTC] - it's used for dates; explicit conversion may be needed in some cases
  • float64 / int64 - numeric data
  • object - strings and other

Step 3: Short explanation of dtypes in Pandas

Let’s briefly cover some dtypes and their usage with simple examples. Table of the most used dtypes in Pandas:

More information about them can be found on this link: Pandas User Guide dtypes.

Pandas offers a wide range of features and methods in order to read, parse and convert between different dtypes. The most popular conversion methods are:

In this step we are going to see how we can check if a given column is numerical or categorical.

For this purpose Pandas offers a bunch of methods like:

To find all methods you can check the official Pandas docs: pandas.api.types.is_datetime64_any_dtype

To check if a column has numeric or datetime dtype we can:

from pandas.api.types import is_numeric_dtype is_numeric_dtype(df['Depth_int'])

result:

for datetime exists several options like: is_datetime64_ns_dtype or is_datetime64_any_dtype:

from pandas.api.types import is_datetime64_any_dtype is_datetime64_any_dtype(df['Date'])

result:

If you like to list only numeric/datetime or other type of columns in a DataFrame you can use method select_dtypes:

including

df.select_dtypes(include=['float64']).columns

result of the operation:

Index(['Depth', 'Magnitude'], dtype='object')

excluding columns by dtype:

df.select_dtypes(exclude=['float64','datetime']).columns

result:

Index(['Date', 'Time', 'Magnitude Type', 'Type', 'Depth_int'], dtype='object')

Step 6: Filter columns by dtype and name in Pandas DataFrame

As an alternative solution you can construct a loop over all columns. Then you can check the dtype and the name of the column.

Below we are listing all numeric column which name has word ‘Depth’:

from pandas.api.types import is_numeric_dtype for col in df.columns: if is_numeric_dtype(df[col]) and 'Depth' in col: print(col)

As a result you will get a list of all numeric columns:

Depth Depth_int

Instead of printing their names you can do something.

Step 7: Apply function on numeric columns only

To apply function to numeric or datetime columns only you can use the method select_dtypes in combination with apply.

The function below will iterate over all numeric columns and double the value:

def double_n(x): return x * df.select_dtypes(include=['float64']).apply(double_n)

Resources

Originally published at https://datascientyst.com on September 1, 2021.

--

--