Whatsapp messenger, owned by Facebook, is one of the most widely used messengers in the world.

Photo by Adem AY on Unsplash

I was not aware that Whatsapp lets its users get all of the chats data. This blog aims to give out a step by step guide to analysing your WhatsApp chats.

You can check all of my code here.

This notebook is developed for Whatsapp data but not limited to it. You can do the same with Telegram, Facebook Messenger or any other messenger you use.

This project is divided into three main parts.

  1. Data Collection

Data Collection

We can easily get our chat data of conversation with anyone.

For iOS:-

  1. Click on Conversation Name and click on the name of your chat
android exported whatsapp chat data sample
android exported whatsapp chat data sample
This is how the data imported from iOS looks like

For Android:-

  1. Click on the Conversation Name
whatsapp chat sample exported from android device
whatsapp chat sample exported from android device
This is how the data imported from Android looks like

Data Preprocessing

Pandas have this function, read_fwf() to read in the text file and return pandas DataFrame.

As you can see, this is clearly not in good format

There are clear differences between the formats exported from Android and iOS devices. We need two separate scripts to convert this text to convert this text into Pandas DataFrame in good format.

  1. Android To DF
android to Pandas DF

2. iOS to DF

iOS to Pandas Df

I wrote two scripts to convert Android chat data into Pandas DataFrame as well as to convert iOS chat data into Pandas DataFrame. I will be using data exported from iPhone for this project.

This is data after passing it through appropriate scripts

To remove Media and Images from dataset

media = whatsapp_df[whatsapp_df['message'] == "<Media omitted>" ]
whatsapp_df.drop(img.index, inplace=True)
img = whatsapp_df[whatsapp_df['message'] == "<image omitted>" ]
whatsapp_df.drop(img.index, inplace=True)
# Reset the indexes
whatsapp_df.reset_index(inplace=True, drop=True)

Data Analysis

Every data analysis project should start with asking good questions before the beginning of the project. I have compiled my questions at the top before executing any line of code for analysing the data.


  1. Who are the different people in the group chat?

1. Who are the different people in the group chat?

people in our group chat

2. Who are the most active users in the group?

most active people in our group

3. What is the timeline of data we have?

timeline of our group chat

Average messages sent per day?

WOAAH! that is amazing, right?

4. What is the most active time of messages in the group throughout the day?

most active hours of group chat

5. Which was the busiest month or most active months of conversation?

most active months of the year

6. Which emoji was used most in messages?

most used emoji

7. What are the top words used in the conversation?

word cloud of our chat group

Thank you for reading my blog. I hope you learned something new. There are many ideas like enhancing the graphs and combining some more columns in it like word count or letters count and seeing who types the more messages.

You can connect me over LinkedIn here or take a look at my portfolio website

Data Science | Machine Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store