3. Methods by class

See 1.6 for general information on methods

3.1. String Methods

find()
str_obj.find(sub) searches for the string sub in the string object str_obj. If found, this method returns the starting index of the first occurrence of sub in str_obj. If not found, this method returns -1. If str_obj = "One Two Three Five Two", then str_obj.find("Two") returns 4
join()
str_obj.join(seq) concatenates (joins) the strings in seq separated by the string str_obj. If str_obj = '_' and seq = ['One', 'Two', 'Three'], then str_obj.join(seq) returns 'One_Two_Three'.
lower()
Makes all the letters in str_obj lowercase. If str_obj = "ONE two THREE", then str_obj.lower() returns "one two three".
isalpha()
Returns True if all characters in str_obj are alphabetic. If str_obj = 'abcd' then str_obj.isalpha() returns True. If str_obj = 'abc1.' then str_obj.isalpha() returns False.
isdigit()
Returns True if all characters in str_obj are numbers. If str_obj = '314' then str_obj.isdigit() returns True. If str_obj = '3.14' then str_obj.isdigit() returns False.
islower()
Returns True if all alphabetic characters in str_obj are lowercase. If str_obj = 'pi is 3.14!' then str_obj.islower() returns True. If str_obj = 'Pi is 3.14!' then str_obj.islower() returns False.
replace()
str_obj.replace(old,new,count) replaces the first count occurrences of old with the value new. Note: count is optional. If left out, all occurrences are replaced. If str_obj = 'a apple, a orange, a banana', then str_obj.replace('a ','an ', 2) returns 'an apple, an orange, a banana'.
split()
str_obj.split(sep) returns a list of the characters in str_obj using sep as the delimiter. If str_obj = '1, 2, 3' then str_obj.split(',') returns the list ['1', ' 2', ' 3']
upper()
Makes all the letters in str_obj uppercase. If str_obj = 'ONE two THREE', then str_obj.upper() returns 'ONE TWO THREE'.

3.2. List Methods

append()
list_obj.append(s) adds element s to the end of the list list_obj. If list_obj = ['a','b','c'], then list_obj.append('d') updates list_obj to ['a', 'b', 'c', 'd']
extend()
list_obj.extend(iter) adds the items in the iterable object iter to the end of the list list_obj. If list_obj = ['a','b','c'] and str1 = 'def', then list_obj.extend(str1) updates list_obj to ['a', 'b', 'c', 'd', 'e', 'f']
index()
list_obj.index(s) returns the index value of the first element in list_obj that is equal to s. If list_obj = [1,2,'a','a'] then list_obj.index('a') returns 2. (Recall, indices start at 0)
insert(i,s)
list_obj.insert(i,s) inserts element s in index i. If list_obj = ['a','c','d'] then list_obj.insert(1,'b') updates list_obj to ['a', 'b', 'c', 'd']
pop()
list_obj.pop(i) returns and removes the element in index i in list_obj. If list_obj = ['a','a','b','c'] then list_obj.pop(1) returns ‘a’ and updates list_obj to ['a', 'b', 'c']
sort()
list_obj.sort() sorts the elements in the list list_obj. If list_obj = ['a','d','c','b'] then list_obj.sort() updates list_obj to ['a', 'b', 'c', 'd']

3.3. Dictionary Methods

Adding items
Unlike list, dictionaries have no insert() or append() method. To add data to an existing dictionary, name the dictionary and key and set it equal to the vale. Ex. dict['New Key'] = value adds value to the dictionary object dict with key 'New Key'.
get()
dict_obj.get('s') returns the value for the key s. If dict_obj = {'a' : 2, 'b' : 4, 'c' : 6} then dict_obj.get('b') returns 4. If the key does not exist, the method returns None.
items()
dict_obj.items() returns the current list of dictionary elements in the form (’key’, value). If dict_obj = {'a' : 1, 'b' : 2, 'c' : 3} then dict_obj.items() returns dict_items([('a', 1), ('b', 2), ('c', 3)]). Useful for looping over list within dictionaries (see section 6.1.3.)

3.4. Pandas (DataFrames and Series) Methods

drop_duplicates()
df.drop_duplicates() drops duplicate rows in the pandas DataFrame object df.
dropna()
df.dropna() drops all rows with at least 1 missing value in the pandas DataFrame object df.
duplicated()
df.duplicated() ****returns a boolean (True or False) pandas Series indicating duplicate rows. The first occurrence of a duplicate returns False and all following duplicate rows return True. The length of the returned pandas series is equal to the number of rows in the pandas DataFrame object df.
fillna(value=s)
df.fillna() fills NaN/NA values with specified value s across all columns and rows in the pandas DataFrame object df.
groupby(’columnName’)
df.groupby(’columnName’) groups data by unique values in column columnName in the pandas DataFrame object df. Can be used for the first stage of grouping (split) before applying some operation to the grouped data.
head(n)
df.head(n) returns the first n rows of the pandas DataFrame object df. If n is not specified, it returns the first 5 rows by default.
isna()
df.isna() returns a boolean object the same size as the pandas DataFrame object df, indicating if the values are NA.
loc[]
df.loc[] is used to access a group of rows and columns in the pandas DataFrame object df. (See section 4.4 for details)
max()
df.max() returns the maximum value per column in the pandas DataFrame object df.
mean()
df.mean() returns the mean value per column in the pandas DataFrame object df.
median()
df.median() returns the median value per column in the pandas DataFrame object df.
min()
df.min() returns the minimum value per column in the pandas DataFrame object df.
read_csv()
df = pd.read_csv(’fileName’) reads data from a csv file and creates a DataFrame object df. Note: pd is the alias assigned to pandas when importing the library, i.e., import pandas as pd
rename(columns=dict_obj)
df.rename(columns=dict_obj) renames columns in the pandas DataFrame object df using data in the dictionary object dict_obj. You can create the dictionary object dict_obj with key=old column name and values=new column names to rename the columns in df. (See section 1.3.2 for info on dictionary objects)
replace(thisValue,thatValue)
df.replace(this_Value,that_Value) searches, finds, and replaces all instances of this_Value with that_Value in the pandas DataFrame object df.
reset_index()
df.reset_index() resets the index of the pandas DataFrame object df to consecutive numbers and creates a new column that stores the old index value (before the method is applied). This method is typically used after data processing when rows are removed.
sort_values(by=’column_name’, ascending=True)
df.sort_values(by='column_name') sorts the rows in the pandas DataFrame object df by column column_name. By default, ascending is True. Set to ascending = False to sort in descending order.
sum()
df.sum() returns the sum of the columns in the pandas DataFrame object df.
tail(n)
df.tail(n) returns the last n rows of the pandas DataFrame object df. If n is not specified, it returns the last 5 rows by default.
unique()
df['Col'].unique() returns unique values in the column Col in the pandas DataFrame object df. Note: unique() must be called on a pandas Series object which is created when referencing a single column in the DataFrame object.
Attributes (general information about the data in the DataFrame)
- df.dtypes - returns the data type for each column in df.
- df.columns - returns the column names for each column in df.
- df.shape - returns the size (# rows, # columns) of df.
- df.info() - returns information about the pandas DataFrame object df including data structure info, indices info, column names, number of of non-null values, data types, and memory usage.