3. Methods by class
See 1.6 for general information on methods
3.1. String Methods
find()
str_obj.find(sub)
searches for the string sub in the string objectstr_obj
. If found, this method returns the starting index of the first occurrence ofsub
instr_obj
. If not found, this method returns -1. Ifstr_obj = "One Two Three Five Two"
, thenstr_obj.find("Two")
returns4
join()
str_obj.join(seq)
concatenates (joins) the strings in seq separated by the stringstr_obj
. Ifstr_obj = '_'
andseq = ['One', 'Two', 'Three']
, thenstr_obj.join(seq)
returns'One_Two_Three'
.lower()
Makes all the letters in
str_obj
lowercase. Ifstr_obj = "ONE two THREE"
, thenstr_obj.lower()
returns"one two three"
.isalpha()
Returns
True
if all characters in str_obj are alphabetic. Ifstr_obj = 'abcd'
thenstr_obj.isalpha()
returnsTrue
. Ifstr_obj = 'abc1.'
thenstr_obj.isalpha()
returnsFalse
.isdigit()
Returns
True
if all characters in str_obj are numbers. Ifstr_obj = '314'
thenstr_obj.isdigit()
returnsTrue
. Ifstr_obj = '3.14'
thenstr_obj.isdigit()
returnsFalse
.islower()
Returns
True
if all alphabetic characters in str_obj are lowercase. Ifstr_obj = 'pi is 3.14!'
thenstr_obj.islower()
returnsTrue
. Ifstr_obj = 'Pi is 3.14!'
thenstr_obj.islower()
returnsFalse
.replace()
str_obj.replace(old,new,count)
replaces the firstcount
occurrences ofold
with the valuenew
. Note:count
is optional. If left out, all occurrences are replaced. Ifstr_obj = 'a apple, a orange, a banana'
, thenstr_obj.replace('a ','an ', 2)
returns'an apple, an orange, a banana'
.split()
str_obj.split(sep)
returns a list of the characters instr_obj
usingsep
as the delimiter. Ifstr_obj = '1, 2, 3'
thenstr_obj.split(',')
returns the list['1', ' 2', ' 3']
upper()
Makes all the letters in
str_obj
uppercase. Ifstr_obj = 'ONE two THREE'
, thenstr_obj.upper()
returns'ONE TWO THREE'
.
3.2. List Methods
append()
list_obj.append(s)
adds elements
to the end of the listlist_obj
. Iflist_obj = ['a','b','c']
, thenlist_obj.append('d')
updateslist_obj
to['a', 'b', 'c', 'd']
extend()
list_obj.extend(iter)
adds the items in the iterable objectiter
to the end of the listlist_obj
. Iflist_obj = ['a','b','c']
andstr1 = 'def'
, thenlist_obj.extend(str1)
updateslist_obj
to['a', 'b', 'c', 'd', 'e', 'f']
index()
list_obj.index(s)
returns the index value of the first element inlist_obj
that is equal tos
. Iflist_obj = [1,2,'a','a']
thenlist_obj.index('a')
returns 2. (Recall, indices start at 0)insert(i,s)
list_obj.insert(i,s)
inserts elements
in indexi
. Iflist_obj = ['a','c','d']
thenlist_obj.insert(1,'b')
updates list_obj to['a', 'b', 'c', 'd']
pop()
list_obj.pop(i)
returns and removes the element in index i in list_obj. Iflist_obj = ['a','a','b','c']
thenlist_obj.pop(1)
returns ‘a’ and updates list_obj to['a', 'b', 'c']
sort()
list_obj.sort()
sorts the elements in the listlist_obj
. Iflist_obj = ['a','d','c','b']
thenlist_obj.sort()
updateslist_obj
to['a', 'b', 'c', 'd']
3.3. Dictionary Methods
Adding items
Unlike list, dictionaries have no insert() or append() method. To add data to an existing dictionary, name the dictionary and key and set it equal to the vale. Ex.
dict['New Key'] = value
adds value to the dictionary objectdict
with key'New Key'
.get()
dict_obj.get('s')
returns the value for the keys
. Ifdict_obj = {'a' : 2, 'b' : 4, 'c' : 6}
thendict_obj.get('b')
returns4
. If the key does not exist, the method returnsNone
.items()
dict_obj.items()
returns the current list of dictionary elements in the form (’key’, value). Ifdict_obj = {'a' : 1, 'b' : 2, 'c' : 3}
thendict_obj.items()
returnsdict_items([('a', 1), ('b', 2), ('c', 3)])
. Useful for looping over list within dictionaries (see section 6.1.3.)
3.4. Pandas (DataFrames and Series) Methods
drop_duplicates()
df.drop_duplicates()
drops duplicate rows in the pandas DataFrame objectdf
.dropna()
df.dropna()
drops all rows with at least 1 missing value in the pandas DataFrame objectdf
.duplicated()
df.duplicated()
****returns a boolean (True
orFalse
) pandas Series indicating duplicate rows. The first occurrence of a duplicate returnsFalse
and all following duplicate rows returnTrue
. The length of the returned pandas series is equal to the number of rows in the pandas DataFrame objectdf
.fillna(value=s)
df.fillna()
fills NaN/NA values with specified value s across all columns and rows in the pandas DataFrame objectdf
.groupby(’columnName’)
df.groupby(’columnName’)
groups data by unique values in columncolumnName
in the pandas DataFrame objectdf
. Can be used for the first stage of grouping (split) before applying some operation to the grouped data.head(n)
df.head(n)
returns the first n rows of the pandas DataFrame objectdf
. If n is not specified, it returns the first 5 rows by default.isna()
df.isna()
returns a boolean object the same size as the pandas DataFrame objectdf
, indicating if the values are NA.loc[]
df.loc[]
is used to access a group of rows and columns in the pandas DataFrame object df. (See section 4.4 for details)max()
df.max()
returns the maximum value per column in the pandas DataFrame objectdf
.mean()
df.mean()
returns the mean value per column in the pandas DataFrame objectdf
.median()
df.median()
returns the median value per column in the pandas DataFrame objectdf
.min()
df.min()
returns the minimum value per column in the pandas DataFrame objectdf
.read_csv()
df = pd.read_csv(’fileName’)
reads data from a csv file and creates a DataFrame objectdf
. Note:pd
is the alias assigned to pandas when importing the library, i.e.,import pandas as pd
rename(columns=dict_obj)
df.rename(columns=dict_obj)
renames columns in the pandas DataFrame object df using data in the dictionary objectdict_obj
. You can create the dictionary objectdict_obj
with key=old column name and values=new column names to rename the columns indf
. (See section 1.3.2 for info on dictionary objects)replace(thisValue,thatValue)
df.replace(this_Value,that_Value)
searches, finds, and replaces all instances ofthis_Value
withthat_Value
in the pandas DataFrame objectdf
.reset_index()
df.reset_index()
resets the index of the pandas DataFrame objectdf
to consecutive numbers and creates a new column that stores the old index value (before the method is applied). This method is typically used after data processing when rows are removed.sort_values(by=’column_name’, ascending=True)
df.sort_values(by='column_name')
sorts the rows in the pandas DataFrame objectdf
by columncolumn_name
. By default,ascending
isTrue
. Set toascending = False
to sort in descending order.sum()
df.sum()
returns the sum of the columns in the pandas DataFrame objectdf
.tail(n)
df.tail(n)
returns the last n rows of the pandas DataFrame objectdf
. If n is not specified, it returns the last 5 rows by default.unique()
df['Col'].unique()
returns unique values in the column Col in the pandas DataFrame objectdf
. Note:unique()
must be called on a pandas Series object which is created when referencing a single column in the DataFrame object.Attributes (general information about the data in the DataFrame)
df.dtypes
- returns the data type for each column indf
.df.columns
- returns the column names for each column indf
.df.shape
- returns the size (# rows, # columns) ofdf
.df.info()
- returns information about the pandas DataFrame objectdf
including data structure info, indices info, column names, number of of non-null values, data types, and memory usage.