The data science process typically consists of six steps that are : Frame the problem, collect data, process data, explore data,modelize data, communicate the results. In this article we will face them one by one to finally be familiarize by the process of data science.
When it comes to perform a data science project, the first thing to do is to frame the problem ; tha’s mean you have to prepare a project charter. This charter contains information such as what you’re going to research, how the company benefits from that, what data and resources you need.
The second step is to collect data. you’ll need data to give you the insights needed to turn the problem around with a solution. This part of the process involves thinking through what data you’ll need and finding ways to get that data, whether it’s querying internal databases, or purchasing external datasets. Data can takes many forms ranging from Excel spreadsheets to different types of databases
Data exploration is concerned with building a deeper understanding of your data. You try to understand how variables interact with each other, the distribution of the data, and whether there are outliers. To achieve this you mainly use descriptive statis- tics, visual techniques, and simple modeling. This step often goes by the abbreviation EDA, for Exploratory Data Analysis.
In this phase you use models, domain knowledge, and insights about the data you found in the previous steps to answer the research question. You select a technique from the fields of statistics, machine learning, operations research, and so on. Build- ing a model is an iterative process that involves selecting the variables for the model, executing the model, and model diagnostics.
Finally, you present the results to your business. These results can take many forms, ranging from presentations to research reports. Sometimes you’ll need to automate the execution of the process because the business will want to use the insights you gained in another project or enable an operational process to use the outcome from your model.