So, Age ~ Rings and must be predicted from the set of different measures as Diameter, Weight, Height, Length, etc. It is supervised learning task, because of the dataset with relation Result~Features is provided. Simple check shows numbers of rings from 1 to 29 and it is huge range for classification. Another supervised learning algorithm is a linear regression.
EDA (exploratory data analysis) is a first step before building any model and there is the code for loading dataset into memory and plotting several relations, for example Rings~Diameter
library(ggplot2) # read dataset from local file abalone <- read.csv("/Users/kostya/Downloads/abalone.data.csv", header=F) # set names for dataframe columns colnames(abalone) <- c('Sex', 'Length', 'Diameter', 'Height', 'WholeWeight', 'ShuckedWeight', 'VisceraWeight', 'ShellWeight', 'Rings') # plot histogram hist(abalone$Rings, freq=F) # depicture all charts on one plot qplot(Diameter, Rings, data=abalone, geom=c("point", "smooth"), method="lm", color=Sex, se=F)
This image (as well as other relations like Rings~WholeWeight, etc) shows pretty well difference relations for each sex and the first thought is to apply different regression for each 'sex' or use 'sex' as a factor.
For example, go on with different regression models, we need to construct formula by investigating each relations. For example, there is Rings~WholeWeight relation
# plot each sex on different plot ggplot(abalone, aes(VisceraWeight, Rings)) + geom_jitter(alpha=0.25) + geom_smooth(method=lm, se=FALSE) + facet_grid(. ~ Sex)
Obvious, that for Male and Infant relations has logarithmic trend and it will be logically to add 'log' in formula.
summary(lm(Rings~Length+I(Diameter^2)+log(WholeWeight)+log(ShellWeight)+log(ShuckedWeight) +Height+VisceraWeight, data=subset(abalone, Sex %in% 'I')) ) summary(lm(Rings~Length+I(Diameter^2)+log(WholeWeight)+log(ShellWeight)+ShuckedWeight +Height+VisceraWeight, data=subset(abalone, Sex %in% 'M')) ) summary(lm(Rings~Length+I(Diameter^2)+WholeWeight+ShellWeight+ShuckedWeight +Height+VisceraWeight, data=subset(abalone, Sex %in% 'F')) )
Rings= 8.5398 - 7.6755*Length + 8.7707*Diameter^2 + 1.4837*log(WholeWeight) + 2.0745*log((ShellWeight) -2.3415*log(ShuckedWeight) + 27.8275*Height + 5.9972*VisceraWeight
As was mentioned in task description Age=Rings+1.5