Deciding weights for the parameters (similar to Google pagerank)

I crawled some blogs for my project and extracted a few features, like length of the document, in links, out links. Each of these blogs talks about some specific subject and there can be numerous articles on each subject, and I need to decide at most one or two important blogs for each subject. How can I assign weights to these features to select the important blogs?

I can use a machine learning algorithm, but there are millions of blogs and I don’t want to annotate them. Is there a mathematically proven method to decide the weights?



One suggestion: If deciding the rank of A determine how many pages refer to A.

Leave a Reply

Your email address will not be published. Required fields are marked *