How to analyze CSV file data with a While loop in Python

I’m trying to figure out how to start a loop in Python that goes through a csv file. I believe it would be a while loop (can’t use pandas for this assignment) but I’m not sure how to start. The file is from Kaggle – analyzing a page from Reddit trying to get the following:

the average number of comments across all posts the average score across all posts what the highest score is and the title for that post what the lowest score is and the title for that post what the most commented post is with its title and number of comments

this is what I have so far for importing the file:

import csv  #import csv file reddit_vm.csv

def analyze(entries):
    print(f'first entry: {entries[0]}')

with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input:
    entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input)]
    avgScore = analyze(entries)

and this is what I think I need to do:

pseudocode:

need a variable to control the loop reading the lines while loop

average the number of comments across all posts

average score across all posts

largest variable for the highest score and print title smallest variable for lowest score

most_comments

Answer

as we discussed in the comments, the simple way to do it would be to read the csv file line by line and use the loops to later store the data in a dict containing the values of the columns into a list such that it is easier to do the aggregation later:

with open('sample1.csv', 'r') as f:
    #read from csv line by line, rstrip helps to remove 'n' at the end of line
    lines = [line.rstrip() for line in f] 

columnslist = lines[0].split(',')
numcolumns = len(columnslist)  # the number of column

result_dict = {}

for colm in columnslist:
    result_dict[colm] = [] # this is for holding the columns values in a single list seperetely.


for line in lines[1:]:
    words = line.split(',') #get the list by comma delimited
    for i in range(numcolumns):
        result_dict[columnslist[i]].append(words[i]) # add in the result dict

print(result_dict)

For example, I’ve the following CSV file:

enter image description here

The print statement would give the following dict: {'name': ['Vag', 'Sam', 'Harris'], 'score': ['0.9', '0.12', '0.98'], 'roll': ['11', '12', '13']}

As you can see, we have what we wanted in list so it’s easier to analyze.

max_score = max(result_dict["score"])
min_score = min(result_dict["score"])
print(max_score, min_score)
# 0.98 0.12

Now you can do much more, but ya it is quite cumbersome without pandas.