Trendlist: [New post] Hate Speech Detection with Machine Learning

Sunday, July 25, 2021

[New post] Hate Speech Detection with Machine Learning

Hate Speech Detection with Machine Learning

by Aman Kharwal

Hate speech is one of the serious issues we see on social media platforms like Twitter and Facebook daily. Most of the posts containing hate speech can be found in the accounts of people with political views. So, if you want to learn how to train a hate speech detection model with machine learning, this article is for you. In this article, I will walk you through the task of hate speech detection with machine learning using Python.

Hate Speech Detection with Machine Learning

There is no legal definition of hate speech because people's opinions cannot easily be classified as hateful or offensive. Nevertheless, the United Nations defines hate speech as any type of verbal, written or behavioural communication that can attack or use discriminatory language regarding a person or a group of people based on their identity based on religion, ethnicity, nationality, race, colour, ancestry, gender or any other identity factor.

Hope you now have understood what hate speech is. Social media platforms need to detect hate speech and prevent it from going viral or ban it at the right time. So in the section below, I will walk you through the task of hate speech detection with machine learning using the Python programming language.

Hate Speech Detection using Python

The dataset I'm using for the hate speech detection task is downloaded from Kaggle. This dataset was originally collected from Twitter and contains the following columns:

index
count
hate_speech
offensive_language
neither
class
tweet

So let's start by importing all the necessary Python libraries and the dataset we need for this task:

Dataset

   Unnamed: 0  count  hate_speech  offensive_language  neither  class  \ 0           0      3            0                   0        3      2    1           1      3            0                   3        0      1    2           2      3            0                   3        0      1    3           3      3            0                   2        1      1    4           4      6            0                   6        0      1                                                    tweet   0  !!! RT @mayasolovely: As a woman you shouldn't...   1  !!!!! RT @mleew17: boy dats cold...tyga dwn ba...   2  !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...   3  !!!!!!!!! RT @C_G_Anderson: @viva_based she lo...   4  !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...

I will add a new column to this dataset as labels which will contain the values as:

Hate Speech
Offensive Language
No Hate and Offensive

   Unnamed: 0  count  hate_speech  offensive_language  neither  class  \ 0           0      3            0                   0        3      2    1           1      3            0                   3        0      1    2           2      3            0                   3        0      1    3           3      3            0                   2        1      1    4           4      6            0                   6        0      1                                                    tweet                 labels   0  !!! RT @mayasolovely: As a woman you shouldn't...  No Hate and Offensive   1  !!!!! RT @mleew17: boy dats cold...tyga dwn ba...     Offensive Language   2  !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...     Offensive Language   3  !!!!!!!!! RT @C_G_Anderson: @viva_based she lo...     Offensive Language   4  !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...     Offensive Language

Now I will only select the tweet and labels columns for the rest of the task of training a hate speech detection model:

data = data[["tweet", "labels"]] print(data.head())

                                               tweet                 labels 0  !!! RT @mayasolovely: As a woman you shouldn't...  No Hate and Offensive 1  !!!!! RT @mleew17: boy dats cold...tyga dwn ba...     Offensive Language 2  !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...     Offensive Language 3  !!!!!!!!! RT @C_G_Anderson: @viva_based she lo...     Offensive Language 4  !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...     Offensive Language

Now I will create a function to clean the texts in the tweet column:

Now let's split the dataset into training and test sets and train a machine learning model for the task of hate speech detection:

Now let's test this machine learning model to see if it detects hate speech or not:

['Hate Speech']

Summary

So this is how you can train a machine learning model for the task of detecting hate speech by using the Python programming language. Hate speech is one of the serious issues we see on social media platforms like Twitter and Facebook daily. Most of the posts containing hate speech can be found in the accounts of people with political views. I hope you liked this article on the task of detecting hate speech with Machine Learning using Python. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal | July 25, 2021 at 1:42 pm | Tags: Classification, Machine Learning Project | Categories: Machine Learning | URL: https://wp.me/pbUb2D-3Vy

Comment

Trendlist

Sunday, July 25, 2021

[New post] Hate Speech Detection with Machine Learning

Hate Speech Detection with Machine Learning

Hate Speech Detection with Machine Learning

Hate Speech Detection using Python

Summary

No comments:

Post a Comment

Generate a catchy title for a collection of newfangled music by making it your own

Report Abuse

Labels

Sunday, July 25, 2021

[New post] Hate Speech Detection with Machine Learning

New post on Data Science | Machine Learning | Python | C++ | Coding | Programming | JavaScript

Hate Speech Detection with Machine Learning

Hate Speech Detection with Machine Learning

Hate Speech Detection using Python

Summary

No comments:

Post a Comment

Generate a catchy title for a collection of newfangled music by making it your own