How We Built an AI Headline Classification Model (in One Week)

Apr 10, 2024

José Luis Marina

Introduction

Within our learning-by-doing philosophy, a week ago we set out to build an AI model that would classify news headlines as clickbait or not.

You can try it here: News Headline Clickbait Prediction

In this post, we’re going to share how we did it and what we learned in the process.

Phase 1: Collecting Data

Without a doubt, this is the most important part along with labeling and cleaning the corpus that we’ll later use to train the model. To get the data, we’ve searched and dived through the web, specifically at:

Kaggle:

A classic for finding datasets of all kinds, specifically this one: Clickbait News Detection Competition

This dataset has 24,000 news items with headline, text, and label, meaning “someone” or “something” has labeled them as clickbait or not, so what might be clickbait for you may not be for the dataset and vice versa.

In any case, we brought them in and cleaned them up a bit to keep only the headlines and labels, and translated them to save in a csv.

AI
BERT
BETO
Classification
NLP
Clickbait

How We Built an AI Headline Classification Model (in One Week)

Introduction

Phase 1: Collecting Data

Kaggle:

huggingface:

github

SerpApi