7/21/21 Response
Building the Model
- In this exercise I imported the demographic dataset as a csv file. I removed the wealth class target from the overall dataset and put it in its own row called target where initially all the classes labeled as 2 were set to 0’s and the rest as 1’s. This made the problem go from a mutli-class classification to a simpler regression. From here, I seperated the training, testing, and validation sets. After batching the data, I set the feature columns. I kept size, gender, and education as simple numeric variables. I made age a bucketized variable where the boundaries are 20, 40, and 60. This means that the first column represents ages 0-20, second is 20-40, third is 40-60, and lastly fourth is 60+. The batch size was kept as 32 and the model was defined. There were 2 dense layers with 128 neurons each, followed by a dropout layer of .1, and lastly a dense layer with 1 output since it is a binary problem. After compiling the model, I fit the model to the test accuracy and got an accuracy of 0.9873. This was for when I was determining if the features constituted a person who was in wealth class 2 or any of the others. When predicting for 3 against others I got an accuracy of 0.8868. For 4 against the others I got an accuracy of 0.6043. And lastly, I got an accuracy of 0.4966 when predicting wealth class 5 against the rest.
Trends, Observartions, and Potential Changes:
- The model seems to be much better at predicting when it is a lower income individual. This is shown by level two getting in the high 90’s while 4 and 5 are both below 65%. This is potentially because there are different trends for higher income individual. It’s also possible that these variables plateau once you hit a certain wealth class, meaning that it is unable to predict those higher classes as well. Additionally, this could most likely be remedied by adding additional variables which are more likely to predict wealth like number of cars, type of tv, type of phone, etc. Changes I would maybe make to the model include making size a bucketed variable rather than just a numeric value. Additionally, I would look to make gender a one-hot variable. I also think it would be possible to one-hot education. These additions could maybe boost accuracy, but I predict that it would not save the drastic drop off as the wealth class increases.