Viewing Data
Getting General Information of DataFrame
Getting the general information about a DataFrame
- Number of rows
- Number of columns
- The name of each column (
Column
) - Number of values in each column that are not missing (
Non-Null Count
) - The data type of each column (
Dtype
)
list of columns, their types and the number of non-null values.
1df.info()
Getting Shape of DataFrame
Shape of a DataFrame is a tuple with two elements: the number of rows and the number of columns.
1df.shape
Getting Head/Random/Last Rows
Each method below allows specifying the number of rows to get. The sample()
method allows getting a more diverse preview of rows than head()
or tail()
.
1df.head()
1df.sample()
1df.tail()
Getting Descriptive Statistics
The describe()
method (API) returns typical statistics
- For numerical columns: count, mean, std, min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.
- For object columns (e.g. strings or timestamps): count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency. Timestamps also include the first and last items.
For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns.
1df.describe()
Getting descriptive statistics only for columns of certain types
1df.describe(include='object')
Getting descriptive statistics for all columns, regardless of its data type
1df.describe(include='all')
Counting Values
Number of values for a column
1df['column'].count()
Number of unique values for a column
1df['column'].nunique()
List of unique values with their counts
1df['column'].value_counts()