Methodology

We used The World Happiness Report's comprehensive qualitative data approach as a baseline happiness score to analyze and make our initial predictions. Each of the six indicators of The World Happiness Report are identified for their ability to statistically represent key indicators of happiness. The data sets were cleaned, normalized, and scaled for comparative analysis, the data from the 2020 report was then redefined. We found the median value from The World Happiness Report for 2020 which allowed us to bring in a value for 2018 and allowed us to take advantage of 5 years of existing data. We manipulated the data using Python with Pandas and Matplotlib to merge CSVs and create early scatter plots with latitude and longitude coordinates. Matplotlib served as a great tool to help visualize results, but it’s stylization options were not optimal for performing the in-depth analysis needed for our presentation. We integrated our cleaned and manipulated CSV files into Tableau to help tell our story.

The trends identified in the quantitative analysis for each country were then compared with the provisional scores. First, we used singular regression to make predictions on future happiness. By applying singular regression machine learning to The World Happiness Report data, we were able to make future happiness predictions. We only predicted 5 years out, because we only had 2015-2020 data, so we didn’t want to project further than that. We were then able to make individual predictions over the next 5 years.

Finally, we layered in additional data with The World Happiness Report and applied train test machine learning to test the predictability of certain variables. The variables chosen were the Corruption Perceptions Index, Military Expenditure, GDP, and life expectancy. With these, we decided to test whether these were good indicators of happiness across this time.