Indexing and Filtering Data
Indexing using loc[]
loc[]
returns a single element or multiple elements for specified index values.
1import pandas as pd23df = pd.read_csv('state_nature.csv')45print(df.loc[['Alabama', 'Arizona'], ['flower', 'insect']])
Whereas loc[] uses the index and column labels to access elements, iloc[] uses integers to designate the positions of the elements that you want to get.
1import pandas as pd23df = pd.read_csv('state_nature.csv')45print(df.iloc[[1, 3], ['flower', 'insect']])
ike loc[], we can access multiple rows and/or columns with iloc[] by passing it lists of their positions or using slicing.
1import pandas as pd23df = pd.read_csv('state_nature.csv')45print(df.iloc[1:10])
Changing a DataFrame Index
To set index values use the set_index()
method. This method takes an existing column from a DataFrame and replaces the index with the values in that column.
1import pandas as pd23states = ['Alabama', 'Alaska', 'Arizona', 'Arkansas']4flowers = ['Camellia', 'Forget-me-not', 'Saguaro cactus blossom', 'Apple blossom']5insects = ['Monarch butterfly', 'Four-spotted skimmer dragonfly', 'Two-tailed swallowtail', 'European honey bee']6index = ['state 1', 'state 2', 'state 3', 'state 4']78df = pd.DataFrame({'state': states, 'flower': flowers, 'insect': insects}, index=index)9df = df.set_index('state')
Filtering Using query()
The query()
method is called on a DataFrame and expects a string as input. The string represents the query you want to make on your DataFrame, which basically means it tells Python which rows to filter.
1import pandas as pd23df = pd.read_csv('/datasets/vg_sales.csv')45print(df.query("jp_sales > 1")[['name', 'jp_sales']])
Checking for existence using the query() method with the in
keyword in our query string.
1import pandas as pd23df = pd.read_csv('/datasets/vg_sales.csv')45handhelds = ['3DS', 'DS', 'GB', 'GBA', 'PSP']6print(df.query("platform in @handhelds")[['name', 'platform']])
Filtering on Multiple Conditions (with Logical Series)
Filtering conditions can be combined following the rules:
- Each individual condition is separated by parentheses.
- The
&
operator is used instead ofand
. - The
~
operator is used instead ofnot
(and it precedes the condition we want to negate).
1import pandas as pd23df = pd.read_csv('/datasets/vg_sales.csv')4df['user_score'] = pd.to_numeric(df['user_score'], errors='coerce')56print(df[(df['na_sales'] >= 1) | (df['eu_sales'] >= 1) | (df['jp_sales'] >= 1)].head())
Filtering on Multiple Conditions (with .query())
We can also filter on multiple conditions by writing query strings for the query() method.
1import pandas as pd23df = pd.read_csv('/datasets/vg_sales.csv')4df['user_score'] = pd.to_numeric(df['user_score'], errors='coerce')56print(df.query("platform == 'Wii' and genre != 'Sports'").head())
Replacing Values Using where()
The where()
method checks the condition for each value in the column. If the condition is true, where()
does nothing; if it’s false, where()
replaces the current value with the new one.
1import pandas as pd23df = pd.read_csv('/datasets/vg_sales.csv')4df['user_score'] = pd.to_numeric(df['user_score'], errors='coerce')56df['platform'] = df['platform'].where(df['platform'] != 'NES', 'Nintendo Entertainment System')