Insurance is a contract whereby an individual obtains financial protection against losses from an insurance company against the risks of financial losses as mentioned in the insurance. Many types of insurance exist today and there are so many companies that offer insurance services. These companies always need to predict whether or not a person will buy insurance so that they can save time and money for the most profitable customers. So if you want to know how we can use machine learning to predict whether or not an individual will buy insurance, this article is for you. In this article, I will walk you through the task of Insurance Prediction with machine learning using Python.

Insurance Prediction with Machine Learning

The task of insurance prediction is something that adds value to every insurance company. They use data from their database about everyone they have contacted to promote their insurance services and try to find the most potential people who can buy insurance. This helps a company to target the most profitable customers and saves time and money for the Insurance Company.

In the section below, I will take you through the task of Insurance Prediction with Machine Learning using Python. For the task of Insurance prediction with machine learning, I have collected a dataset from Kaggle about the previous customers of a travel insurance company. Here our task is to train a machine learning model to predict whether an individual will purchase the insurance policy from the company or not. 

Insurance Prediction using Python

Let's start the task of Insurance prediction with machine learning by importing the necessary Python libraries and the dataset:

import pandas as pd data = pd.read_csv("TravelInsurancePrediction.csv") data.head()
   Unnamed: 0  Age  ... EverTravelledAbroad TravelInsurance 0           0   31  ...                  No               0 1           1   31  ...                  No               0 2           2   34  ...                  No               1 3           3   28  ...                  No               0 4           4   28  ...                  No               0  [5 rows x 10 columns]

The unnamed column in this dataset is of no use, so I'll just remove it from the data:

data.drop(columns=["Unnamed: 0"], inplace=True)

Now let's look at some of the necessary insights to get an idea about what kind of data we are working with:

data.isnull().sum()
Age                    0 Employment Type        0 GraduateOrNot          0 AnnualIncome           0 FamilyMembers          0 ChronicDiseases        0 FrequentFlyer          0 EverTravelledAbroad    0 TravelInsurance        0 dtype: int64
data.info()
<class 'pandas.core.frame.DataFrame'>  RangeIndex: 1987 entries, 0 to 1986  Data columns (total 9 columns):   #   Column               Non-Null Count  Dtype   ---  ------               --------------  -----    0   Age                  1987 non-null   int64    1   Employment Type      1987 non-null   object   2   GraduateOrNot        1987 non-null   object   3   AnnualIncome         1987 non-null   int64    4   FamilyMembers        1987 non-null   int64    5   ChronicDiseases      1987 non-null   int64    6   FrequentFlyer        1987 non-null   object   7   EverTravelledAbroad  1987 non-null   object   8   TravelInsurance      1987 non-null   int64   dtypes: int64(5), object(4)  memory usage: 139.8+ KB

In this dataset, the labels we want to predict are in the "TravelInsurance" column. The values in this column are mentioned as 0 and 1 where 0 means not bought and 1 means bought. For a better understanding when analyzing this data, I will convert 1 and 0 to purchased and not purchased:

data["TravelInsurance"] = data["TravelInsurance"].map({0: "Not Purchased", 1: "Purchased"})

Now let's start by looking at the age column to see how age affects the purchase of an insurance policy:

Insurance prediction

According to the visualization above, people around 34 are more likely to buy an insurance policy and people around 28 are very less likely to buy an insurance policy. Now let's see how a person's type of employment affects the purchase of an insurance policy:

employment type

According to the visualization above, people working in the private sector or the self-employed are more likely to have an insurance policy. Now let's see how a person's annual income affects the purchase of an insurance policy:

Income affecting purchase of insurance

According to the above visualisation, people who are having an annual income of more than 1400000 are more likely to purchase the insurance policy.

The dataset we are using is based on the purchases of travel insurance so it is more likely that the people who are earning a higher income do travel more, and as a result, they are more likely to purchase travel insurance. This is how you can easily explore every column of this data. Now in the section below, I will take you through how you can train a machine learning model to predict whether a person will purchase travel insurance or not.

Insurance Prediction Model

I will convert all categorical values to 1 and 0 first because all columns are important for training the insurance prediction model:

Now let's split the data and train the model by using the decision tree classification algorithm:

0.8190954773869347

The model gives a score of over 80% which is not bad for this kind of problem. So this is how you can train a machine learning model for the task of insurance prediction using Python.

Summary

So this is how you can analyze what kind of people are more likely to purchase an insurance policy and train a machine learning model for the same. The task of insurance prediction is something that adds value to every insurance company. They use data from their database about everyone they have contacted to promote their insurance services and try to find the most potential people who can buy insurance. I hope you liked this article on the task of Insurance Prediction with Machine Learning using Python. Feel free to ask your valuable questions in the comments section below.