Predicting the number of orders for a product is one of the strategies a business can follow in determining how much to invest in marketing their product. So, predicting the number of orders is an important data science use case for product-based companies. If you want to know how to use machine learning for the number of orders prediction, this article is for you. In this article, I will walk you through the task of the number of orders prediction with machine learning using Python.
Number of Orders Prediction
If you want to predict the number of orders a company may receive for a particular product, then you need to have historical data about the number of orders received by the company. So for this task, I will be using the sales data of supplements that have been collected from Kaggle. The data that I will be using for the task of the number of orders prediction contains data about:
- Product ID
- Store ID
- The type of store where the supplement was sold
- The type of location the order was received from
- Sales Date
- Region code
- Whether it is a public holiday or not at the time of order
- Whether the product was on discount or not
- Number of orders placed
- Sales
I hope you have now got an overview of the problem and the dataset I will be using to solve the problem. Now in the section below, I will take you through the task of the number of orders prediction with machine learning by using the Python programming language.
Number of Orders Prediction using Python
Let's start the task of the number of orders prediction by importing the necessary Python libraries and the dataset:
import pandas as pd import numpy as np data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/supplement.csv") data.head()
ID Store_id Store_Type ... Discount #Order Sales 0 T1000001 1 S1 ... Yes 9 7011.84 1 T1000002 253 S4 ... Yes 60 51789.12 2 T1000003 252 S3 ... Yes 42 36868.20 3 T1000004 251 S2 ... Yes 23 19715.16 4 T1000005 250 S2 ... Yes 62 45614.52 [5 rows x 10 columns]
Now let's have a look at some of the necessary insights from this dataset to know about what kind of dataset we are working with:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 188340 entries, 0 to 188339 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 188340 non-null object 1 Store_id 188340 non-null int64 2 Store_Type 188340 non-null object 3 Location_Type 188340 non-null object 4 Region_Code 188340 non-null object 5 Date 188340 non-null object 6 Holiday 188340 non-null int64 7 Discount 188340 non-null object 8 #Order 188340 non-null int64 9 Sales 188340 non-null float64 dtypes: float64(1), int64(3), object(6) memory usage: 14.4+ MB
ID 0 Store_id 0 Store_Type 0 Location_Type 0 Region_Code 0 Date 0 Holiday 0 Discount 0 #Order 0 Sales 0 dtype: int64
Store_id Holiday #Order Sales count 188340.000000 188340.000000 188340.000000 188340.000000 mean 183.000000 0.131783 68.205692 42784.327982 std 105.366308 0.338256 30.467415 18456.708302 min 1.000000 0.000000 0.000000 0.000000 25% 92.000000 0.000000 48.000000 30426.000000 50% 183.000000 0.000000 63.000000 39678.000000 75% 274.000000 0.000000 82.000000 51909.000000 max 365.000000 1.000000 371.000000 247215.000000
Now let's explore some of the important features from this dataset to know about the factors affecting the number of orders for supplements:
The above figure shows the distribution of the number of orders received according to the store type. Now let's have a look at the distribution of the number of orders, according to the location:
The above figure shows the distribution of the number of orders received according to the location. Now let's have a look at the distribution of the number of orders, according to the discount:
According to the above figure, most people still buy supplements if there is no discount on them. Now let's have a look at how holidays affect the number of orders:
According to the above figure, most of the people buy supplements in working days.
Number of Orders Prediction Model
Now let's prepare the data so that we can train a machine learning model for the task of the number of orders prediction. Here, I will change some of the string values to numerical values:
Now let's split the data into 80% training set and 20% test set:
Now I will be using the light gradient boosting regression algorithm to train the model:
LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0, importance_type='split', learning_rate=0.1, max_depth=-1, min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0, n_estimators=100, n_jobs=-1, num_leaves=31, objective=None, random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True, subsample=1.0, subsample_for_bin=200000, subsample_freq=0)
Now let's have a look at the predicted values:
ypred = model.predict(xtest) data = pd.DataFrame(data={"Predicted Orders": ypred.flatten()}) print(data.head())
Predicted Orders 0 47.351897 1 97.068717 2 66.577788 3 85.143083 4 54.451098
So this is how you can train a machine learning model for the task of the number of orders prediction by using the Python programming language.
Summary
Predicting the number of orders of a product is one of the strategies a product based company can follow for determining how much they should invest in the marketing of their product. I hope you liked this article on the task of the number of orders prediction with machine learning using Python. Feel free to ask your valuable questions in the comments section below.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.