Knowledge Base

Indexing and Filtering Data

Indexing using loc[]

loc[] returns a single element or multiple elements for specified index values.

1import pandas as pd
2
3df = pd.read_csv('state_nature.csv')
4
5print(df.loc[['Alabama', 'Arizona'], ['flower', 'insect']])

Whereas loc[] uses the index and column labels to access elements, iloc[] uses integers to designate the positions of the elements that you want to get.

1import pandas as pd
2
3df = pd.read_csv('state_nature.csv')
4
5print(df.iloc[[1, 3], ['flower', 'insect']])

ike loc[], we can access multiple rows and/or columns with iloc[] by passing it lists of their positions or using slicing.

1import pandas as pd
2
3df = pd.read_csv('state_nature.csv')
4
5print(df.iloc[1:10])

Changing a DataFrame Index

To set index values use the set_index() method. This method takes an existing column from a DataFrame and replaces the index with the values in that column.

1import pandas as pd
2
3states = ['Alabama', 'Alaska', 'Arizona', 'Arkansas']
4flowers = ['Camellia', 'Forget-me-not', 'Saguaro cactus blossom', 'Apple blossom']
5insects = ['Monarch butterfly', 'Four-spotted skimmer dragonfly', 'Two-tailed swallowtail', 'European honey bee']
6index = ['state 1', 'state 2', 'state 3', 'state 4']
7
8df = pd.DataFrame({'state': states, 'flower': flowers, 'insect': insects}, index=index)
9df = df.set_index('state')

Filtering Using query()

The query() method is called on a DataFrame and expects a string as input. The string represents the query you want to make on your DataFrame, which basically means it tells Python which rows to filter.

1import pandas as pd
2
3df = pd.read_csv('/datasets/vg_sales.csv')
4
5print(df.query("jp_sales > 1")[['name', 'jp_sales']])

Checking for existence using the query() method with the in keyword in our query string.

1import pandas as pd
2
3df = pd.read_csv('/datasets/vg_sales.csv')
4
5handhelds = ['3DS', 'DS', 'GB', 'GBA', 'PSP']
6print(df.query("platform in @handhelds")[['name', 'platform']])

Filtering on Multiple Conditions (with Logical Series)

Filtering conditions can be combined following the rules:

  1. Each individual condition is separated by parentheses.
  2. The & operator is used instead of and.
  3. The ~ operator is used instead of not (and it precedes the condition we want to negate).
1import pandas as pd
2
3df = pd.read_csv('/datasets/vg_sales.csv')
4df['user_score'] = pd.to_numeric(df['user_score'], errors='coerce')
5
6print(df[(df['na_sales'] >= 1) | (df['eu_sales'] >= 1) | (df['jp_sales'] >= 1)].head())

Filtering on Multiple Conditions (with .query())

We can also filter on multiple conditions by writing query strings for the query() method.

1import pandas as pd
2
3df = pd.read_csv('/datasets/vg_sales.csv')
4df['user_score'] = pd.to_numeric(df['user_score'], errors='coerce')
5
6print(df.query("platform == 'Wii' and genre != 'Sports'").head())

Replacing Values Using where()

The where() method checks the condition for each value in the column. If the condition is true, where() does nothing; if it’s false, where() replaces the current value with the new one.

1import pandas as pd
2
3df = pd.read_csv('/datasets/vg_sales.csv')
4df['user_score'] = pd.to_numeric(df['user_score'], errors='coerce')
5
6df['platform'] = df['platform'].where(df['platform'] != 'NES', 'Nintendo Entertainment System')
Send Feedback
close
  • Bug
  • Improvement
  • Feature
Send Feedback
,