I am using a very small dataset to teach myself predictive data analytics. I am using both Weka and Orange to try and solve this issue.
To start with I am using this csv file to train the system:
gender,weight M,82 F,71 M,90 F,76 M,88 F,56 M,100 F,63 M,84 F,79 M,92 F,66
You will notice that all the F values are below 80 and all the M values are above 80.
I then have this data file:
weight, gender 70,, 100,, 69,, 76,, 99,,
Notice that the ‘gender’ value is missing.
I would like to come up with a system that will read the data file and place either an M or F into the gender field based on some data analysis.
I looked into Linear Regression but that involves a relationship between two moving values (as X increases – so does Y)
I then looked into K-Clustering but all that did was show me two clusters with M > 80 and F < 80
Please can you advise a system I can use to try and apply some predictions to my dataset?