Andreas Xenofontos

SQL FIFA 2022 WORLD CUP SQUADS ANALYSIS

Project Summary

Welcome to my SQL project exploring the FIFA World Cup 2022 squads! In this project, I will be analyzing a dataset that contains information on the 32 teams that participated in the 2022 FIFA World Cup this past December, including their players, positions, nationalities, and current clubs. I will be using easy-to-intermediate SQL commands to extract valuable insights and answer various questions related to the World Cup squads. Some of the major topics that I will explore in this project include the distribution of players by position, the average age of players in each team, and the representation of different countries and clubs.

Data

This particular dataset contains information such as team names, league names, positions, age, international appearances, and performance statistics such as goals scored in previous World Cups and the number of goals a player has scored for their country. At a first glance, the dataset looks great. No issues with missing values, and no inaccurate data. We have to keep in mind that the dataset contains information up until the date before the World Cup started. We are currently in February 2023 and some players have changed teams.

Shoutout to Maven Analytics for providing free and exciting datasets to help people explore the power of data. For more information, please visit this link

Analysis & Insights

Question 1: Which players scored the most WC Goals before the start of the World Cup 2022?

SELECT TOP (5) Player, [WC Goals]
FROM WorldCup2022.dbo.[2022worldcupsquads]
ORDER BY [WC Goals] DESC ; 

Thomas Muller has scored the most WC goals, 10 to be exact. Here we have additional information that 3 out of 5 of our top goalscorers are the captains of their national team as well.

Question 2: How many midfielders that play in England (“Premier League”) have scored more than one goal for their National Team?

SELECT COUNT(*) as NumMidfieldersEnglandScorers
FROM WorldCup2022.dbo.[2022worldcupsquads]
WHERE Position = 'Midfielder' AND League = 'England' AND Goals > 1;

We do have 31 midfielders that have scored more than one goal. Premier League is arguably the best league in the world where the best players compete at, which makes sense to have a relatively high number. Midfielder is a crucial position on the field that can add goal contribution to a team. It connects the defense with the offense. We can use different conditions within the WHERE clause and answer multiple questions at the same time.

Question 3: How many Leagues have at least one representative participating in the tournament?

SELECT DISTINCT League
FROM WorldCup2022.dbo.[2022worldcupsquads] ;

We had players representing 44 different professional Leagues, which shows the wide variety of talent from all around the world and why almost all the professional Leagues temporarily stopped playing in order to watch the tournament. The DISTINCT of SQL function helps us to identify only different values within our table.

Question 4: Which club had the most players in the FIFA World Cup 2022?

SELECT TOP (5) Club, COUNT(*) AS NumPlayers
FROM WorldCup2022.dbo.[2022worldcupsquads] 
GROUP BY Club
ORDER BY NumPlayers DESC; 

Bayern Munich had a total number of 17 players in the World Cup, followed by Barcelona and Manchester City with 16. An impressive number!

Question 5: Which League had the most players?

SELECT TOP(10)League, COUNT(League) AS TopLeagueCount
FROM WorldCup2022.dbo.[2022worldcupsquads]
GROUP BY League
ORDER BY TopLeagueCount DESC ; 

Similar to the question before, the League with the most players by a big margin is “England”. This shows why it’s worth exploring the Premier League in our analysis as we did before. Not surprisingly, the top 5 Leagues are all from Europe, where the highest level of football/soccer is played.

Question 6: What’s the average age of the players within each squad?

SELECT Team, 
 MIN(Age) as Minimum,
 MAX(CASE WHEN Quartile = 1 THEN Age END) As Q1,
 MAX(CASE WHEN Quartile = 2 THEN Age END) AS Median,
 MAX(CASE WHEN Quartile = 3 THEN Age END) AS Q3,
 MAX(Age) as Maximum

FROM (
 SELECT Team, Age,
 NTILE(4) OVER (PARTITION BY Team ORDER BY Age) AS Quartile
 FROM WorldCup2022.dbo.[2022worldcupsquads]) w

GROUP BY Team
ORDER BY Median; 

When we are trying to find the average age, our data could be easily skewed so one of the best ways to measure it is the median. I had to create a subquery and use the CASE command as the data set is pretty large and it gives us an easier way to break it down while maintaining logical consistency. As we can see in the table below, Ecuador, Ghana, and the USA have the lowest Median of 24 while Belgium has the highest with 29.

Question 7: What’s the percentage of players that play in their country’s domestic league?

CREATE VIEW NationalPlayers AS
SELECT Team, COUNT(*) AS NumNationalPlayers
FROM WorldCup2022.dbo.[2022worldcupsquads]
WHERE League = Team
GROUP BY Team;

CREATE VIEW AllPlayers AS
SELECT Team, COUNT(*) AS NumAllPlayers
FROM WorldCup2022.dbo.[2022worldcupsquads]
GROUP BY Team;

SELECT NationalPlayers.Team, NumNationalPlayers, NumAllPlayers, (NumNationalPlayers * 100 / NumAllPlayers) AS Percentage
FROM NationalPlayers
JOIN AllPlayers ON NationalPlayers.Team = AllPlayers.Team
ORDER BY Percentage DESC;

This is by far the hardest question of the project as I had to improvise on how to connect the players and the country. I used two different views and then join them together and created a new variable “Percentage”. As we can see on the table, there are two teams that had all of their players from their domestic League: Qatar & Saudi Arabia. On the other hand, Argentina (Winner!) and Serbia had only one of their players (3%) playing in the domestic league.

Conclusion

Showcasing your SQL skills through a project is not an easy task. After some hesitation, I decided to showcase my skills using data from the sport I love the most. Through this project, I gained a deeper understanding of the composition of the World Cup squads and the factors that may have contributed to the success prior to the tournament. I’m looking forward to more projects that showcase my SQL skills and hopefully combine them with a visualization tool such as Tableau or Power BI.

Share:

LinkedIn
Facebook
Twitter

Get in Touch!