dataclasses from multi-row/column csv

I am trying to import a csv file, and assign the values to different dataclasses. After this, I have to conduct various analysis of the dataclasses, such as mean, mode, median, etc.

The issue I am having is importing them in such a way that I can use them later. I have the following code written:

from dataclasses import dataclass, field
from typing import List
import csv
from csv import DictReader

@dataclass
class Grades:
    class_course: str
    class_grades: int

@dataclass
class Student:
    name: str
    grades: List[Grades] = field(default_factory=list)

def create_student_db():
    s = Student([],[])
    courses = []
    with open("Sample Data/Sample2.csv") as read_obj:       # pass the file object to DictReader() to get the DictReader object
        csv_dict_reader = DictReader(read_obj)
        csv_reader = csv.reader(read_obj, delimiter = ",")  # get column names from a csv file
        column_names = csv_dict_reader.fieldnames
        for col in column_names:
            if col == "Student Name":
                continue
            else:
                courses.append(col)                         # Create list of column names

        for row in csv_reader:                              # For each row in the csv file
            n = row[0]                                      # The first value is the name
            s.name.append(n)                                # Append the name to the name of the student class
            i = 0                                           # For iteration through the values on the row
            while i < len(row)-1:                           # While i is less than the length of the row
                g = Grades(courses[i], row[i + 1])          # Set values in the Grades class to the Course name, and the row value + 1
                s.grades.append(g)                          # Append the values for the Grades to the Grades in Student
                i += 1

    return s

a = create_student_db()

print(a)

The input file looks like this (csv format):

Student Name,Course_1,Course_2,Course_3,Course_4
Johnny Rotten,10,20,20,40
Sid Vicious,90,50,30,10
Lars Larsson,90,10,30,60
John Jameson,90,90,90,90

And the output looks like this:

Student(name=['Johnny Rotten', 'Sid Vicious', 'Lars Larsson', 'John
Jameson'], grades=[Grades(class_course='Course_1', class_grades='10'),
Grades(class_course='Course_2', class_grades='20'),
Grades(class_course='Course_3', class_grades='20'),
Grades(class_course='Course_4', class_grades='40'),
Grades(class_course='Course_1', class_grades='90'),
Grades(class_course='Course_2', class_grades='50'),
Grades(class_course='Course_3', class_grades='30'),
Grades(class_course='Course_4', class_grades='10'),
Grades(class_course='Course_1', class_grades='90'),
Grades(class_course='Course_2', class_grades='10'),
Grades(class_course='Course_3', class_grades='30'),
Grades(class_course='Course_4', class_grades='60'),
Grades(class_course='Course_1', class_grades='90'),
Grades(class_course='Course_2', class_grades='90'),
Grades(class_course='Course_3', class_grades='90'),
Grades(class_course='Course_4', class_grades='90')])

Obviously, this is for an academic exercise, but I am having problems following the lectures.

Can anyone suggest how I can get the dataclasses done so that it is meaningfull, and I can pull out values like individual student means, modes, etc?

Answer

You are almost there. What is confusing you is that you are placing the name of the students and the grades inside one student object which doesn’t make sense.

  • Have a list of Student, in which each Student will have a name and a list of grades.
  • I would suggest changing the class name from Grade to Course with attributes name and grade. The attribute name from grades to courses in Student. It’s easier to understand that way. A student is registered in courses which they will have been given grades.
  • Also, remember to convert the grades to int. That way you can make calculations later.
from dataclasses import dataclass, field
from typing import List
import csv
from csv import DictReader

@dataclass
class Course:
    name: str
    grade: int

@dataclass
class Student:
    name: str
    courses: List[Course] = field(default_factory=list)

def create_student_db():
    students = []
    courses = []
    with open("Sample Data/Sample2.csv") as read_obj:       # pass the file object to DictReader() to get the DictReader object
        csv_dict_reader = DictReader(read_obj)
        csv_reader = csv.reader(read_obj, delimiter = ",")  # get column names from a csv file
        column_names = csv_dict_reader.fieldnames
        for col in column_names:
            if col == "Student Name":
                continue
            else:
                courses.append(col)                         # Create list of column names

        for row in csv_reader:                              # For each row in the csv file
            n = row[0]                                      # The first value is the name
            s = Student(name=n)                               # Append the name to the name of the student class
            students.append(s)
            i = 0                                           # For iteration through the values on the row
            while i < len(row)-1:                           # While i is less than the length of the row
                g = Course(courses[i], int(row[i + 1]))          # Set values in the Grades class to the Course name, and the row value + 1
                s.courses.append(g)                          # Append the values for the Grades to the Grades in Student
                i += 1

    return students

students = create_student_db()

print(students)

Output:

[Student(name='Johnny Rotten', courses=[Course(name='Course_1', grade=10), Course(name='Course_2', grade=20), Course(name='Course_3', grade=20), Course(name='Course_4', grade=40)]),
 Student(name='Sid Vicious', courses=[Course(name='Course_1', grade=90), Course(name='Course_2', grade=50), Course(name='Course_3', grade=30), Course(name='Course_4', grade=10)]),
 Student(name='Lars Larsson', courses=[Course(name='Course_1', grade=90), Course(name='Course_2', grade=10), Course(name='Course_3', grade=30), Course(name='Course_4', grade=60)]),
 Student(name='John Jameson', courses=[Course(name='Course_1', grade=90), Course(name='Course_2', grade=90), Course(name='Course_3', grade=90), Course(name='Course_4', grade=90)])]

As @martineau suggested also, you can use the functions from the statitiscs module.

Example:

If you want to know the mean of a student’s grades considering all of his/her courses.

import statistics as st
st.mean(map(lambda c: c.grade, students[0].courses))

output:

22.5