How to setup a data science solution method

When faced with a business problem, data is more and more a key driver to assess the problem. Extracting useful knowledge from data to solve business problems can be treated systematically by following a structured process. One example to structure this process is the Cross Industry Standard Process for Data Mining, (CRISP-DM Project, 2000) as visualized below.

CRISP-DM process

CRISP is a conceptual model for thinking about data science projects. A successful data mining project involves an intelligent compromise between what the data can do and the project goals. Each data-driven business decision-making problem is unique, comprising its own combination of goals, desires and constraints. The six steps in the model are briefly described below.

Business Understanding

It is vital to understand the problem to be solved. The initial formulation may not be complete or optimal so multiple iterations may be necessary for an acceptable solution formulation to appear. In this first stage, where creativity plays a large role, a team should think carefully about the problem to be solved including various scenarios.

Data Understanding

If solving the business problem is the goal, data comprises the available raw material from which the solution will be built. It is important to understand the strengths and limitations of the data because rarely is there an exact match with the problem.

Data Preparation

The analytic technologies that we can bring to bear are powerful but they impose certain requirements on the data they use. They often require data to be in a form different from how the data are provided naturally, and some conversion will be necessary. Therefore, a data preparation phase often proceeds along with data understanding, in which the data are manipulated and converted into forms that yield better results.

Modelling

The output of modelling is some sort of model or pattern capturing regularities in the data. A model is a simplified representation of reality based on some assumptions about what is and is not important to solve the specific business problem. The modelling stage is the primary place where data mining techniques are applied to the data.

Evaluation

The purpose of the evaluation stage is to assess the data mining results rigorously and to gain confidence that they are valid and reliable before moving on. A model may be extremely accurate (> 99%) by laboratory standards, but evaluation in the actual business context may reveal that it still produces too many false alarms to be economically feasible. Evaluating the results of data mining includes both quantitative and qualitative assessments.

Deployment

In deployment, the results of data mining — and increasingly the data mining techniques themselves — are put into real use to realize some return on investment. The clearest cases of deployment involve implementing a predictive model in some information system or business process.

(Adapted from Data science for business)