Trendlist: [New post] Movie Rating Analysis using Python

Wednesday, September 22, 2021

[New post] Movie Rating Analysis using Python

Movie Rating Analysis using Python

by Aman Kharwal

We all watch movies for entertainment, some of us never rate it, while some viewers always rate every movie they watch. This type of viewer helps in rating movies for people who go through the movie reviews before watching any movie to make sure they are about to watch a good movie. So, if you are new to data science and want to learn how to analyze movie ratings using the Python programming language, this article is for you. In this article, I will walk you through the task of Movie Rating Analysis using Python.

Movie Rating Analysis using Python

Analyzing the rating given by viewers of a movie helps many people decide whether or not to watch that movie. So, for the Movie Rating Analysis task, you first need to have a dataset that contains data about the ratings given by each viewer. For this task, I have collected a dataset from Kaggle that contains two files:

one file contains the data about the movie Id, title and the genre of the movie
and the other file contains the user id, movie id, ratings given by the user and the timestamp of the ratings

You can download both these datasets from here.

Now let's get started with the task of movie rating analysis by importing the necessary Python libraries and the datasets:

import numpy as np import pandas as pd movies = pd.read_csv("movies.dat", delimiter='::') print(movies.head())

0       10                La sortie des usines Lumière (1895)    Documentary|Short 1       12                      The Arrival of a Train (1896)    Documentary|Short 2       25  The Oxford and Cambridge University Boat Race ...                  NaN 3       91                         Le manoir du diable (1896)         Short|Horror 4      131                           Une nuit terrible (1896)  Short|Comedy|Horror

In the above code, I have only imported the movies dataset that does not have any column names, so let's define the column names:

movies.columns = ["ID", "Title", "Genre"] print(movies.head())

    ID                                              Title                Genre 0   10                La sortie des usines Lumière (1895)    Documentary|Short 1   12                      The Arrival of a Train (1896)    Documentary|Short 2   25  The Oxford and Cambridge University Boat Race ...                  NaN 3   91                         Le manoir du diable (1896)         Short|Horror 4  131                           Une nuit terrible (1896)  Short|Comedy|Horror

Now let's import the ratings dataset:

ratings = pd.read_csv("ratings.dat", delimiter='::') print(ratings.head())

   1  0114508  8  1381006850 0  2   499549  9  1376753198 1  2  1305591  8  1376742507 2  2  1428538  1  1371307089 3  3    75314  1  1595468524 4  3   102926  9  1590148016

The rating dataset also doesn't have any column names, so let's define the column names of this data also:

ratings.columns = ["User", "ID", "Ratings", "Timestamp"] print(ratings.head())

   User       ID  Ratings   Timestamp 0     2   499549        9  1376753198 1     2  1305591        8  1376742507 2     2  1428538        1  1371307089 3     3    75314        1  1595468524 4     3   102926        9  1590148016

Now I am going to merge these two datasets into one, these two datasets have a common column as ID, which contains movie ID, so we can use this column as the common column to merge the two datasets:

data = pd.merge(movies, ratings, on=["ID", "ID"]) print(data.head())

   ID                                              Title  ... Ratings   Timestamp 0  10                La sortie des usines Lumière (1895)  ...      10  1412878553 1  12                      The Arrival of a Train (1896)  ...      10  1439248579 2  25  The Oxford and Cambridge University Boat Race ...  ...       8  1488189899 3  91                         Le manoir du diable (1896)  ...       6  1385233195 4  91                         Le manoir du diable (1896)  ...       5  1532347349  [5 rows x 6 columns]

As it is a beginner level task, so I will first have a look at the distribution of the ratings of all the movies given by the viewers:

ratings = data["Ratings"].value_counts() numbers = ratings.index quantity = ratings.values import plotly.express as px fig = px.pie(data, values=quantity, names=numbers) fig.show()

So, according to the pie chart above, most movies are rated 8 by users. From the above figure, it can be said that most of the movies are rated positively.

As 10 is the highest rating a viewer can give, let's take a look at the top 10 movies that got 10 ratings by viewers:

data2 = data.query("Ratings == 10") print(data2["Title"].value_counts().head(10))

Joker (2019)                       1479 Interstellar (2014)                1382 1917 (2019)                         819 Avengers: Endgame (2019)            808 The Shawshank Redemption (1994)     699 Gravity (2013)                      653 The Wolf of Wall Street (2013)      581 Hacksaw Ridge (2016)                570 Avengers: Infinity War (2018)       534 La La Land (2016)                   510 Name: Title, dtype: int64

So, according to this dataset, Joker (2019) got the highest number of 10 ratings from viewers. This is how you can analyze movie ratings using Python as a data science beginner.

Summary

So this is how you can do movie rating analysis by using the Python programming language as a data science beginner. Analyzing the ratings given by viewers of a movie helps many people decide whether or not to watch that movie. I hope you liked this article on Movie rating analysis using Python. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal | September 22, 2021 at 2:42 pm | Categories: Machine Learning | URL: https://wp.me/pbUb2D-44v

Comment

Trendlist

Wednesday, September 22, 2021

[New post] Movie Rating Analysis using Python

Movie Rating Analysis using Python

Movie Rating Analysis using Python

Summary

No comments:

Post a Comment

Generate a catchy title for a collection of newfangled music by making it your own

Report Abuse

Labels

Wednesday, September 22, 2021

[New post] Movie Rating Analysis using Python

New post on Data Science | Machine Learning | Python | C++ | Coding | Programming | JavaScript

Movie Rating Analysis using Python

Movie Rating Analysis using Python

Summary

No comments:

Post a Comment

Generate a catchy title for a collection of newfangled music by making it your own