Predicting the number of orders for a product is one of the strategies a business can follow in determining how much to invest in marketing their product. So, predicting the number of orders is an important data science use case for product-based companies. If you want to know how to use machine learning for the number of orders prediction, this article is for you. In this article, I will walk you through the task of the number of orders prediction with machine learning using Python.

Number of Orders Prediction

If you want to predict the number of orders a company may receive for a particular product, then you need to have historical data about the number of orders received by the company. So for this task, I will be using the sales data of supplements that have been collected from Kaggle. The data that I will be using for the task of the number of orders prediction contains data about:

  1. Product ID
  2. Store ID
  3. The type of store where the supplement was sold
  4. The type of location the order was received from
  5. Sales Date
  6. Region code
  7. Whether it is a public holiday or not at the time of order
  8. Whether the product was on discount or not
  9. Number of orders placed
  10. Sales

I hope you have now got an overview of the problem and the dataset I will be using to solve the problem. Now in the section below, I will take you through the task of the number of orders prediction with machine learning by using the Python programming language.

Number of Orders Prediction using Python

Let's start the task of the number of orders prediction by importing the necessary Python libraries and the dataset:

import pandas as pd import numpy as np data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/supplement.csv") data.head()
         ID  Store_id Store_Type  ... Discount #Order     Sales 0  T1000001         1         S1  ...      Yes      9   7011.84 1  T1000002       253         S4  ...      Yes     60  51789.12 2  T1000003       252         S3  ...      Yes     42  36868.20 3  T1000004       251         S2  ...      Yes     23  19715.16 4  T1000005       250         S2  ...      Yes     62  45614.52  [5 rows x 10 columns]

Now let's have a look at some of the necessary insights from this dataset to know about what kind of dataset we are working with:

data.info()
<class 'pandas.core.frame.DataFrame'>  RangeIndex: 188340 entries, 0 to 188339  Data columns (total 10 columns):   #   Column         Non-Null Count   Dtype    ---  ------         --------------   -----     0   ID             188340 non-null  object    1   Store_id       188340 non-null  int64     2   Store_Type     188340 non-null  object    3   Location_Type  188340 non-null  object    4   Region_Code    188340 non-null  object    5   Date           188340 non-null  object    6   Holiday        188340 non-null  int64     7   Discount       188340 non-null  object    8   #Order         188340 non-null  int64     9   Sales          188340 non-null  float64  dtypes: float64(1), int64(3), object(6)  memory usage: 14.4+ MB
data.isnull().sum()
ID               0 Store_id         0 Store_Type       0 Location_Type    0 Region_Code      0 Date             0 Holiday          0 Discount         0 #Order           0 Sales            0 dtype: int64
data.describe()
            Store_id        Holiday         #Order          Sales count  188340.000000  188340.000000  188340.000000  188340.000000 mean      183.000000       0.131783      68.205692   42784.327982 std       105.366308       0.338256      30.467415   18456.708302 min         1.000000       0.000000       0.000000       0.000000 25%        92.000000       0.000000      48.000000   30426.000000 50%       183.000000       0.000000      63.000000   39678.000000 75%       274.000000       0.000000      82.000000   51909.000000 max       365.000000       1.000000     371.000000  247215.000000

Now let's explore some of the important features from this dataset to know about the factors affecting the number of orders for supplements:

affect of stores on Number of Orders

The above figure shows the distribution of the number of orders received according to the store type. Now let's have a look at the distribution of the number of orders, according to the location:

Number of Orders Prediction

The above figure shows the distribution of the number of orders received according to the location. Now let's have a look at the distribution of the number of orders, according to the discount:

affect of discounts

According to the above figure, most people still buy supplements if there is no discount on them. Now let's have a look at how holidays affect the number of orders:

affect of holidays on Number of Orders

According to the above figure, most of the people buy supplements in working days. 

Number of Orders Prediction Model

Now let's prepare the data so that we can train a machine learning model for the task of the number of orders prediction. Here, I will change some of the string values to numerical values:

Now let's split the data into 80% training set and 20% test set:

Now I will be using the light gradient boosting regression algorithm to train the model:

LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,               importance_type='split', learning_rate=0.1, max_depth=-1,               min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,               n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,               random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,               subsample=1.0, subsample_for_bin=200000, subsample_freq=0)

Now let's have a look at the predicted values:

ypred = model.predict(xtest) data = pd.DataFrame(data={"Predicted Orders": ypred.flatten()}) print(data.head())
   Predicted Orders 0         47.351897 1         97.068717 2         66.577788 3         85.143083 4         54.451098

So this is how you can train a machine learning model for the task of the number of orders prediction by using the Python programming language.

Summary

Predicting the number of orders of a product is one of the strategies a product based company can follow for determining how much they should invest in the marketing of their product. I hope you liked this article on the task of the number of orders prediction with machine learning using Python. Feel free to ask your valuable questions in the comments section below.