View on GitHub

Top UK YouTubers 2024

Data Project

Download this project as a .zip file Download this project as a tar.gz file

Data Project: Excel to Power BI

excel-to-powerbi-animated-diagram

Table of contents

Objective

The Head of Marketing wants to find out who the top YouTubers are in 2024 to decide which YouTubers would be best to run marketing campaigns throughout the rest of the year.

To create a dashboard that provides insights into the top UK YouTubers in 2024 including their

This will help the marketing team decide which YouTubers to collaborate with for their marketing campaigns.

User story

As the Head of Marketing, I want to use a dashboard that analyses YouTube channel data in the UK.

This dashboard should allow me to identify the top-performing channels based on metrics like subscriber base and average views.

With this information, I can make more informed decisions about which Youtubers are right to collaborate with, and therefore maximize how effective each marketing campaign is.

Data source

We need data on the top UK YouTubers in 2024 that including their

Stages

Design

Dashboard components required

To understand what it should contain, we need to figure out what questions we need the dashboard to answer:

  1. Who are the top 10 YouTubers with the most subscribers?
  2. Which 3 channels have uploaded the most videos?
  3. Which 3 channels have the most views?
  4. Which 3 channels have the highest average views per video?
  5. Which 3 channels have the highest views per subscriber ratio?
  6. Which 3 channels have the highest subscriber engagement rate per video uploaded?

For now, these are some of the questions we need to answer, this may change as we progress down our analysis.

Dashboard mockup

Some of the data visuals that may be appropriate for answering our questions include:

  1. Table
  2. Treemap
  3. Scorecards
  4. Horizontal bar chart

Dashboard-Mockup

Tools

Tool Purpose
Excel Exploring the data
SQL Server Cleaning, testing, and analyzing the data
Power BI Visualizing the data via interactive dashboards
GitHub Hosting the project documentation and version control
Mokkup AI Designing the wireframe/mockup of the dashboard

Development

Pseudocode

  1. Get the data
  2. Explore the data in Excel
  3. Load the data into SQL Server
  4. Clean the data with SQL
  5. Test the data with SQL
  6. Visualize the data in Power BI
  7. Generate the findings based on the insights
  8. Write the documentation + commentary
  9. Publish the data to GitHub Pages

Data exploration notes

This is the stage where you have a scan of what’s in the data, errors, inconsistencies, bugs, weird and corrupted characters, etc

  1. There are at least 4 columns that contain the data we need for this analysis, which signals we have everything we need from the file without needing to contact the client for any more data.
  2. The first column contains the channel ID with what appears to be channel IDS, which are separated by a @ symbol - we need to extract the channel names from this.
  3. Some of the cells and header names are in a different language - we need to confirm if these columns are needed, and if so, we need to address them.
  4. We have more data than we need, so some of these columns would need to be removed

Data cleaning

The aim is to refine our dataset to ensure it is structured and ready for analysis.

The cleaned data should meet the following criteria and constraints:

Below is a table outlining the constraints on our cleaned dataset:

Property Description
Number of Rows 100
Number of Columns 4

Here is a tabular representation of the expected schema for the clean data:

Column Name Data Type Nullable
channel_name VARCHAR NO
total_subscribers INTEGER NO
total_views INTEGER NO
total_videos INTEGER NO
  1. Remove unnecessary columns by only selecting the ones you need
  2. Extract YouTube channel names from the first column
  3. Rename columns using aliases

Transform the data

/*
# 1. Select the required columns
# 2. Extract the channel name from the 'NOMBRE' column
*/

-- 1.
SELECT
    SUBSTRING(NOMBRE, 1, CHARINDEX('@', NOMBRE) -1) AS channel_name,  -- 2.
    total_subscribers,
    total_views,
    total_videos

FROM
    top_uk_youtubers_2024

Create the SQL view

/*
# 1. Create a view to store the transformed data
# 2. Cast the extracted channel name as VARCHAR(100)
# 3. Select the required columns from the top_uk_youtubers_2024 SQL table 
*/

-- 1.
CREATE VIEW view_uk_youtubers_2024 AS

-- 2.
SELECT
    CAST(SUBSTRING(NOMBRE, 1, CHARINDEX('@', NOMBRE) -1) AS VARCHAR(100)) AS channel_name, -- 2. 
    total_subscribers,
    total_views,
    total_videos

-- 3.
FROM
    top_uk_youtubers_2024

Testing

Here are the data quality tests conducted:

Row count check

/*
# Count the total number of records (or rows) are in the SQL view
*/

SELECT
    COUNT(*) AS no_of_rows
FROM
    view_uk_youtubers_2024;

Row count check

Column count check

SQL query

/*
# Count the total number of columns (or fields) are in the SQL view
*/


SELECT
    COUNT(*) AS column_count
FROM
    INFORMATION_SCHEMA.COLUMNS
WHERE
    TABLE_NAME = 'view_uk_youtubers_2024'

Output

Column count check

Data type check

SQL query

/*
# Check the data types of each column from the view by checking the INFORMATION SCHEMA view
*/

-- 1.
SELECT
    COLUMN_NAME,
    DATA_TYPE
FROM
    INFORMATION_SCHEMA.COLUMNS
WHERE
    TABLE_NAME = 'view_uk_youtubers_2024';

Output

Data type check

Duplicate count check

SQL query

/*
# 1. Check for duplicate rows in the view
# 2. Group by the channel name
# 3. Filter for groups with more than one row
*/

-- 1.
SELECT
    channel_name,
    COUNT(*) AS duplicate_count
FROM
    view_uk_youtubers_2024

-- 2.
GROUP BY
    channel_name

-- 3.
HAVING
    COUNT(*) > 1;

Output

Duplicate count check

Visualization

Results

GIF of Power BI Dashboard

This shows the Top UK Youtubers in 2024 so far.

DAX Measures

1. Total Subscribers (M)

Total Subscribers (M) = 
VAR million = 1000000
VAR sumOfSubscribers = SUM(view_uk_youtubers_2024[total_subscribers])
VAR totalSubscribers = DIVIDE(sumOfSubscribers,million)

RETURN totalSubscribers

2. Total Views (B)

Total Views (B) = 
VAR billion = 1000000000
VAR sumOfTotalViews = SUM(view_uk_youtubers_2024[total_views])
VAR totalViews = ROUND(sumOfTotalViews / billion, 2)

RETURN totalViews

3. Total Videos

Total Videos = 
VAR totalVideos = SUM(view_uk_youtubers_2024[total_videos])

RETURN totalVideos

4. Average Views Per Video (M)

Average Views per Video (M) = 
VAR sumOfTotalViews = SUM(view_uk_youtubers_2024[total_views])
VAR sumOfTotalVideos = SUM(view_uk_youtubers_2024[total_videos])
VAR  avgViewsPerVideo = DIVIDE(sumOfTotalViews,sumOfTotalVideos, BLANK())
VAR finalAvgViewsPerVideo = DIVIDE(avgViewsPerVideo, 1000000, BLANK())

RETURN finalAvgViewsPerVideo 

5. Subscriber Engagement Rate

Subscriber Engagement Rate = 
VAR sumOfTotalSubscribers = SUM(view_uk_youtubers_2024[total_subscribers])
VAR sumOfTotalVideos = SUM(view_uk_youtubers_2024[total_videos])
VAR subscriberEngRate = DIVIDE(sumOfTotalSubscribers, sumOfTotalVideos, BLANK())

RETURN subscriberEngRate 

6. Views per subscriber

Views Per Subscriber = 
VAR sumOfTotalViews = SUM(view_uk_youtubers_2024[total_views])
VAR sumOfTotalSubscribers = SUM(view_uk_youtubers_2024[total_subscribers])
VAR viewsPerSubscriber = DIVIDE(sumOfTotalViews, sumOfTotalSubscribers, BLANK())

RETURN viewsPerSubscriber 

Analysis

Findings

For this analysis, we’re going to focus on the questions below to get the information we need for our marketing client -

Here are the key questions we need to answer for our marketing client:

  1. Who are the top 10 YouTubers with the most subscribers?
  2. Which 3 channels have uploaded the most videos?
  3. Which 3 channels have the most views?
  4. Which 3 channels have the highest average views per video?
  5. Which 3 channels have the highest views per subscriber ratio?
  6. Which 3 channels have the highest subscriber engagement rate per video uploaded?

1. Who are the top 10 YouTubers with the most subscribers?

Rank Channel Name Subscribers (M)
1 NoCopyrightSounds 33.60
2 DanTDM 28.60
3 Dan Rhodes 26.50
4 Miss Katy 24.50
5 Mister Max 24.40
6 KSI 24.10
7 Jelly 23.50
8 Dua Lipa 23.30
9 Sidemen 21.00
10 Ali-A 18.90

2. Which 3 channels have uploaded the most videos?

Rank Channel Name Videos Uploaded
1 GRM Daily 14,696
2 Manchester City 8,248
3 Yogscast 6,435

3. Which 3 channels have the most views?

Rank Channel Name Total Views (B)
1 DanTDM 19.78
2 Dan Rhodes 18.56
3 Mister Max 15.97

4. Which 3 channels have the highest average views per video?

Channel Name Average Views per Video (M)
Mark Ronson 32.27
Jessie J 5.97
Dua Lipa 5.76

5. Which 3 channels have the highest views per subscriber ratio?

Rank Channel Name Views per Subscriber
1 GRM Daily 1185.79
2 Nickelodeon 1061.04
3 Disney Junior UK 1031.97

6. Which 3 channels have the highest subscriber engagement rate per video uploaded?

Rank Channel Name Subscriber Engagement Rate
1 Mark Ronson 343,000
2 Jessie J 110,416.67
3 Dua Lipa 104,954.95

Notes

For this analysis, we’ll prioritize analysing the metrics that are important in generating the expected ROI for our marketing client, which are the YouTube channels wuth the most

Validation