Knowledge Base

Data Preprocessing

Below you'll find information related to the material covered in the Data Preprocessing sprint.

Why is the Preprocessing Important?

There are almost always errors of different nature in the data. Not handling it properly may distort your data analysis a lot. So, in most of case we cannot just accept data as it is but need to prepare it for analysis.

Keep in mind that all kinds of problems can pop up when you're working with data:

  • Incomplete or inaccurate data
  • Mistakes in the recording of time values
  • Seconds getting mixed up with minutes
  • Important facts being overlooked.

As an analyst, you're directly responsible for the quality of data and the conclusions you draw. When you get new data, you need to get a sense of how reliable it is. In this case, you can explore some basic questions. Then you and your coworkers can see whether the results are reasonable.

A basic check like this might uncover a problem in the data. Of course, it might also tell you that everything's okay, at least for now.

Send Feedback
close
  • Bug
  • Improvement
  • Feature
Send Feedback
,