Can a quick data project help you to get more likes on Instagram?

Jesus Larrubia
8 min readJul 7, 2020

--

In summary

In this article, I’ll explain how data can help you to make decisions. As a fun example, I work on a quick project attempting to understand how we can maximise the number of likes for our Instagram posts by visualising the upload information via a heatmap, making use of Facebook Graph API, Python, Jupyter and Seaborn.

Context

As you’ve probably already heard (like hundreds of times), information is power and it can be used to your benefit to drive your business decisions.

However, even when the information exists (it’s stored somewhere), we face several challenges…

  • Where is the information located? Is it reachable via our means?
  • Once we have the data in our hands, can we understand it? Is it providing the information we need? Does it need to be processed in a format we — as humans — or an automated process can understand?
  • Even more difficult, what should we do with the information we have? What are your goals? And plans? Do you have any?

I think it is important to note you don’t need to have answers for every question. It will be especially hard for SMEs to find the time to devote to resolving their doubts. However, you could rely on existing BI and AI-based tools or be assisted by data science specialists (or both) to build tailored solutions for your problems.

Data-driven decisions

Instead of relying on AI models or automated tools to act on our behalf through the entire process, in this article, we’ll focus on the steps an individual would follow when deciding to make their own data-backed decisions. When that is the case, you’ll end up following some kind of feedback loop methodology. Like the following:

So far, you might be thinking “okay, this is really cool but I was here to figure out how I can get more likes for my Instagram account”. And that is what we’ll help you to answer in just a second. But we’ll approach it as a small data project based on the previously described methodology.

Setting our needs & goals

Our “marketing company” is run by Hugo, blatantly addicted to Instagram (purely for business reasons, of course) and he’s really interested in figuring out how he can improve the performance of the account and get his company posts to a wider audience.

Once we understand his needs, we decide we can help him by:

  • Choosing a parameter to measure the effectiveness of our decisions. In this case, we decide to use the number of likes of a picture as the evaluation parameter (the more likes a picture obtains, the better).
  • Understanding what is the best time to upload a picture (day of the week and time of day), keeping in mind our goal is trying to maximise our evaluation parameter (the number of likes of the picture).

Improving the performance of your Instagram likes

We know what our goals are and how to measure the (un)success of our decisions, so we can start thinking about data. We need to answer where can we find it and how can we access it.

In this case, the answer is simple. The data is stored by Instagram (Facebook) and can be accessed via their API.

Initial setup

Before we can proceed to access our Instagram account data, we’ll need to ensure an initial set up:

  • Ensure you have an Instagram business account. Only data belonging to business accounts can be accessed via API. You can switch your normal Instagram account to “business” with no costs (but keep in mind the account visibility permissions will change and it’ll be publicly accessible).
  • You’ll need a Facebook page linked to the Instagram business account. This will allow you to use the new Facebook Graph API.
  • Ensure you have a Facebook developer account and create an app with permissions to access Instagram media data (instagram_basic).
  • Give the API a go from the Facebook explorer tool and generate an access token: https://developers.facebook.com/tools/explorer/v2/

Now, we have everything we need to start retrieving and processing our data. We’ll make use of Python and Jupyter for our project. The following snippet will initialise our project with the libraries we’ll lean on.

# API utils libraries.
import requests
import json
# Data handling libraries
import pandas as pd
from pandas.io.json import json_normalize
import numpy as np
# Data visualization libraries.
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
print("Setup Complete")

Data retrieval & processing

Let’s start retrieving the media information via the API. For the purpose of this article, we’ll get the latest 300 media objects (which will include pictures and videos). In a real application, we’d try to gather as many entries as possible to ensure the robustness of our decisions.

# Array containing the list of existing media objects.
media_list = []
# Get Media API endpoint parameters. ## The total number of media objects we want to retrieve.
# For the purpose of our example, we'll retrieve 300 pictures.
total_media = 300
# Your Instagram user id.
ig_user_id = '123456789'
# API access token.
token = 'XXXX'
media_endpoint = 'https://graph.facebook.com/v7.0/%s/media?fields=timestamp,like_count,comments_count,media_type&access_token=%s&limit=%s' % (ig_user_id, token, total_media)
# Get the number of required media objects.
response = requests.get(media_endpoint)
response_json = response.json()
if "error" in response_json:
raise Exception(response_json["error"]["message"])

media_list = response_json["data"]

If no errors are raised, we can assume the data is already in our power so we’ll load it in a Pandas Dataframe (for operational ease) and display the first rows to check everything is ok:

instagram_data_normalised = pd.json_normalize(media_list)
instagram_data_normalised.head()

And it looks good! We’ll make the decisions ourselves so we’ll transform the data to make it more readable. We’d like to be more specific and find out what day of the week and time of day each picture was uploaded. As part of the process, we’ll remove videos from our dataset since we consider:

  • People behave differently when deciding to like a picture or a video. Trying to capture the success of both media types using the same rules can introduce noise when evaluating the results. When evaluating the success of videos we could consider using additional parameters like the number of views.
  • We are particularly interested in pictures since they represent more than 98% of uploads in the addressed Instagram account.
instagram_data = instagram_data_normalised.copy()# Remove videos from our dataset.
instagram_data = instagram_data.drop(instagram_data[instagram_data.media_type != 'IMAGE'].index)
# Convert the timestamp to datetime type.
instagram_data['timestamp'] = pd.to_datetime(instagram_data['timestamp'], errors='coerce')
# Create an additional column to extract the day
# of the week when the media object was uploaded.
instagram_data["weekday"] = instagram_data["timestamp"].dt.weekday
# Create an additional column to extract the time
# (hour) when the media object was uploaded.
instagram_data["hour"] = instagram_data["timestamp"].dt.hour
# Convert week days from numerical (0-6) to categorical values (Monday-Sunday).
weekday_categoricals = {
"weekday": {
0: "Monday",
1: "Tuesday",
2: "Wednesday",
3: "Thursday",
4: "Friday",
5: "Saturday",
6: "Sunday"
}
}
instagram_data.replace(weekday_categoricals, inplace=True)
# Remove no longer needed columns.
del instagram_data['id']
del instagram_data['timestamp']
del instagram_data['comments_count']
del instagram_data['media_type']
instagram_data.head()

Data visualisation & analysis

Though we could start looking into our formatted data, our brain will find it much easier to correlate patterns from a visual representation. Given our application, we’ll be using a heatmap since it fits very well with the nature of the data we want to show and the goals we are after.

We’ll process our data to convert the weekday column into our table labels (X-axis) and the hour column into our table index (Y-axis). To aggregate the results per weekday/time in the table we’ll use their mean value but first, we’ll remove a bit of noise by dropping slots with less than 2 uploaded pictures as we consider they don’t represent valuable information. Using the mean of likes per time slot as an aggregate function, over the total count, should give us a fair insight into what the actual preferences of our Instagram audience are.

# Pivot the dataframe to use hour as index (Y axis) and
# week days as labels (X axis). We could use just one
# pivot table (using aggfunc='mean') but we want to process
# the count table first.
instagram_data_pivot_count = instagram_data.pivot_table(
index='hour',
columns='weekday',
values='like_count',
aggfunc='count')
instagram_data_pivot_sum = instagram_data.pivot_table(
index="hour",
columns='weekday',
values='like_count',
aggfunc=np.sum)
# Remove from the dataset time slots with less than one
# uploaded picture.
instagram_data_pivot_count = instagram_data_pivot_count[instagram_data_pivot_count > 1]
# Calculate the picture likes mean per time slot.
instagram_data_pivot_mean = instagram_data_pivot_sum / instagram_data_pivot_count
instagram_data_pivot_mean.tail()

The table is ready to be displayed in the desired format via a heatmap.

# Pivot the dataframe to use hour as index (Y axis) and
# week days as labels (X axis). We could use just one
# pivot table (using aggfunc='mean') but we want to process
# the count table first.
instagram_data_pivot_count = instagram_data.pivot_table(
index='hour',
columns='weekday',
values='like_count',
aggfunc='count')
instagram_data_pivot_sum = instagram_data.pivot_table(
index="hour",
columns='weekday',
values='like_count',
aggfunc=np.sum)
# Remove from the dataset time slots with less than one
# uploaded picture.
instagram_data_pivot_count = instagram_data_pivot_count[instagram_data_pivot_count > 1]
# Calculate the picture likes mean per time slot.
instagram_data_pivot_mean = instagram_data_pivot_sum / instagram_data_pivot_count
instagram_data_pivot_mean.tail()
In summary

Conclusions

The heatmap graph allows us to better understand how our audience behaves in a very quick way. Before making a conclusion, let’s remember:

  • White spaces represent timeslots where no pictures have been uploaded or the quantity wasn’t representative enough.
  • Normally, we’d be looking for a bigger dataset, with a more diverse data representation across the heatmap layout.

At a glance, we could conclude a few points:

  • Uploads from Monday to Thursday, around lunchtime perform well. The map suggests people could behave similarly in morning periods — 8 to 10 am — but we don’t have enough data to confirm this.
  • Lunchtime on Fridays don’t perform particularly well but late afternoon/early evening does, possibly coinciding with people leaving work to start the weekend.
  • Weekends are quite homogenous, with morning uploads suggesting better results than afternoons.

Decisions

With the data in his hands, our marketing Marketing & Digital Content Manager, Hugo, has the power to decide the best time to upload new Instagram content. The most important point is that any strategy will be backed up by data and he’ll be able to feedback the decision loop with new data to test the results of his actions. Or… Hugo could make use of an AI-based solution to automate this step and, therefore, the whole process. But let’s leave that for another article.

Summary

This has been just a fun explanatory example but I hope you enjoyed the content and it helps you understand how data, treated with the correct tools, can be used to drive your actions and meet your goals.

You can find the whole notebook code on: https://github.com/jlarrubiaq/instagram-likes-heatmap

--

--

Jesus Larrubia
Jesus Larrubia

Written by Jesus Larrubia

Senior Full Stack Engineer at @clevertech

No responses yet