Knowledge Base

Basics of Working with DataFrames

Catching Unexpected Errors (try-except)

When you’re uploading data from multiple systems, be prepared for surprises:

  • Incorrectly formatted data can cause issues when the code runs: a crash. You already have some experience with this. If the numbers in the dataset are strings for whatever reason, you'll need to use the to_numeric() method.
  • Errors can occur toward the end of a file, with code not executing for rows with incorrect values. That means we lose our calculations for the previous, error-free rows.
  • Data sometimes changes. For example, a company might start working with a new partner that sends faulty data for accounting, causing the code to crash.

Use the try-except construction (more info) for such cases.

Special Values

NaN ("not a number") - a special float value used when a computation cannot be carried out (e.g. 0/0) or displayed

None - a special NoneType value used when a value is missing

Copying Columns Between DataFrames

To copy a column from df1 to df2, create a new column in df2 and assign it the values of the df1 column:

1df2['new_column'] = df1['some_column']

If the new_column column had already been in df2, all of its elements would have been replaced with the new ones.

It seems relatively simple: pandas copies a column from df1 and puts it in df2.

However, if you take a closer look, things aren't so simple. For each row of the first DataFrame, pandas looks for a "mate," a row with the same index in the second DataFrame, and takes a value from that row. In our case, the indices in df1 and df2 are the same, so this is a trivial case: all the values get copied in the same order in which they're positioned. If the indices are different, however, we'll get NaN values where the indices are absent.

Note that our DataFrames don't have to have the same numbers of rows. If df1 doesn't have as many rows as df2, then we end up with some NaN values. If df1 has more rows, they simply won't become part of the new DataFrame.

Columns can also exist separately, outside of DataFrames. A single column can be saved in a Series object – an array of values with indices. Since Series have indices, the assignment of a Series to, say, a column of a DataFrame, will work in the same way that we saw earlier – the values will be copied on the basis of matching indices.

Renaming Columns with Hierarchical Names

We've dealt with a table with two-level column names. This is a MultiIndex, a kind of hierarchical indexing structure that we see when an index contains a list of values rather than a single value.

But what if we don't want those complex column names? Then we need to rename our columns using the columns attribute:

1df.columns = ['column_name_1', 'column_name_2', 'column_name_3']

Useful Links

Send Feedback
close
  • Bug
  • Improvement
  • Feature
Send Feedback
,