Everything You Should Know About Dtype in Pandas
To check the dtypes of single or multiple columns in Pandas you can use:
Let’s see other useful ways to check the dtypes in Pandas.
Step 1: Create sample DataFrame
To start, let’s say that you have the date from earthquakes:
Data is available from Kaggle: Significant Earthquakes, 1965–2016.
How to read and convert Kaggle data to Pandas DataFrame: How to Search and Download Kaggle Dataset to Pandas DataFrame
Step 2: Get dtypes for all columns in DataFrame
To get dtypes details for the whole DataFrame you can use attribute — dtypes
:
the result is:
Date datetime64[ns, UTC] Time object Depth float64 Magnitude Type object Type object Magnitude float64 Depth_int int64 dtype: object
we can see several different types like:
datetime64[ns, UTC]
- it's used for dates; explicit conversion may be needed in some casesfloat64
/int64
- numeric dataobject
- strings and other
Step 3: Short explanation of dtypes in Pandas
Let’s briefly cover some dtypes and their usage with simple examples. Table of the most used dtypes in Pandas:
More information about them can be found on this link: Pandas User Guide dtypes.
Pandas offers a wide range of features and methods in order to read, parse and convert between different dtypes. The most popular conversion methods are:
In this step we are going to see how we can check if a given column is numerical or categorical.
For this purpose Pandas offers a bunch of methods like:
To find all methods you can check the official Pandas docs: pandas.api.types.is_datetime64_any_dtype
To check if a column has numeric or datetime dtype we can:
from pandas.api.types import is_numeric_dtype is_numeric_dtype(df['Depth_int'])
result:
for datetime exists several options like: is_datetime64_ns_dtype
or is_datetime64_any_dtype
:
from pandas.api.types import is_datetime64_any_dtype is_datetime64_any_dtype(df['Date'])
result:
If you like to list only numeric/datetime or other type of columns in a DataFrame you can use method select_dtypes
:
including
df.select_dtypes(include=['float64']).columns
result of the operation:
Index(['Depth', 'Magnitude'], dtype='object')
excluding columns by dtype:
df.select_dtypes(exclude=['float64','datetime']).columns
result:
Index(['Date', 'Time', 'Magnitude Type', 'Type', 'Depth_int'], dtype='object')
Step 6: Filter columns by dtype and name in Pandas DataFrame
As an alternative solution you can construct a loop over all columns. Then you can check the dtype and the name of the column.
Below we are listing all numeric column which name has word ‘Depth’:
from pandas.api.types import is_numeric_dtype for col in df.columns: if is_numeric_dtype(df[col]) and 'Depth' in col: print(col)
As a result you will get a list of all numeric columns:
Depth Depth_int
Instead of printing their names you can do something.
Step 7: Apply function on numeric columns only
To apply function to numeric or datetime columns only you can use the method select_dtypes
in combination with apply
.
The function below will iterate over all numeric columns and double the value:
def double_n(x): return x * df.select_dtypes(include=['float64']).apply(double_n)
Resources
Originally published at https://datascientyst.com on September 1, 2021.