I crawled some blogs for my project and extracted a few features, like length of the document, in links, out links. Each of these blogs talks about some specific subject and there can be numerous articles on each subject, and I need to decide at most one or two important blogs for each subject. How can I assign weights to these features to select the important blogs?
I can use a machine learning algorithm, but there are millions of blogs and I don’t want to annotate them. Is there a mathematically proven method to decide the weights?
One suggestion: If deciding the rank of A determine how many pages refer to A.