Python pandas KeyError

I am a newbie in Python, just started to learn. I am doing a sport prediciton based on scores that were before. I have 2 csv files, one is with all matches from the current year and one is filled with standings ( final results of the tournament and rankings + JUST UNIQUE OBJECTS – I mean I only have 14 rows on this). The problem comes with the standings csv that looks like this:

Squad,Rk,MP,W,D,L,GF,GA,GD,Pts,Pts/G,MP,W,D,L,GF,GA,GD,Pts,Pts/G
CFR Cluj,1,18,13,5,0,24,5,19,44,2.44,18,10,5,3,30,14,16,35,1.94

And I have this code that raises me the key error for the first line that I sampled from my csv.

def home_team_ranks_higher(row):
    home_team = row["Home"]
    visitor_team = row["Away"]
    home_rank = standings.loc[home_team]["Rk"]
    visitor_rank = standings.loc[visitor_team]["Rk"]
    return home_rank < visitor_rank

dataset["HomeTeamRanksHigher"] = dataset.apply(home_team_ranks_higher, axis = 1)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-112-d3a62e1e7d32> in <module>
      6     return home_rank < visitor_rank
      7 
----> 8 dataset["HomeTeamRanksHigher"] = dataset.apply(home_team_ranks_higher, axis = 1)
      9 
     10 #dataset["HomeTeamRanksHigher"] = 0

~anaconda3libsite-packagespandascoreframe.py in apply(self, func, axis, raw, result_type, args, **kwds)
   7546             kwds=kwds,
   7547         )
-> 7548         return op.get_result()
   7549 
   7550     def applymap(self, func) -> "DataFrame":

~anaconda3libsite-packagespandascoreapply.py in get_result(self)
    178             return self.apply_raw()
    179 
--> 180         return self.apply_standard()
    181 
    182     def apply_empty_result(self):

~anaconda3libsite-packagespandascoreapply.py in apply_standard(self)
    269 
    270     def apply_standard(self):
--> 271         results, res_index = self.apply_series_generator()
    272 
    273         # wrap results

~anaconda3libsite-packagespandascoreapply.py in apply_series_generator(self)
    298                 for i, v in enumerate(series_gen):
    299                     # ignore SettingWithCopy here in case the user mutates
--> 300                     results[i] = self.f(v)
    301                     if isinstance(results[i], ABCSeries):
    302                         # If we have a view on v, we need to make a copy because

<ipython-input-112-d3a62e1e7d32> in home_team_ranks_higher(row)
      2     home_team = row["Home"]
      3     visitor_team = row["Away"]
----> 4     home_rank = standings.loc[home_team]["Rk"]
      5     visitor_rank = standings.loc[visitor_team]["Rk"]
      6     return home_rank < visitor_rank

~anaconda3libsite-packagespandascoreindexing.py in __getitem__(self, key)
    877 
    878             maybe_callable = com.apply_if_callable(key, self.obj)
--> 879             return self._getitem_axis(maybe_callable, axis=axis)
    880 
    881     def _is_scalar_access(self, key: Tuple):

~anaconda3libsite-packagespandascoreindexing.py in _getitem_axis(self, key, axis)
   1108         # fall thru to straight lookup
   1109         self._validate_key(key, axis)
-> 1110         return self._get_label(key, axis=axis)
   1111 
   1112     def _get_slice_axis(self, slice_obj: slice, axis: int):

~anaconda3libsite-packagespandascoreindexing.py in _get_label(self, label, axis)
   1057     def _get_label(self, label, axis: int):
   1058         # GH#5667 this will fail if the label is not present in the axis.
-> 1059         return self.obj.xs(label, axis=axis)
   1060 
   1061     def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):

~anaconda3libsite-packagespandascoregeneric.py in xs(self, key, axis, level, drop_level)
   3489             loc, new_index = self.index.get_loc_level(key, drop_level=drop_level)
   3490         else:
-> 3491             loc = self.index.get_loc(key)
   3492 
   3493             if isinstance(loc, np.ndarray):

~anaconda3libsite-packagespandascoreindexesrange.py in get_loc(self, key, method, tolerance)
    356                 except ValueError as err:
    357                     raise KeyError(key) from err
--> 358             raise KeyError(key)
    359         return super().get_loc(key, method=method, tolerance=tolerance)
    360 

KeyError: 'CFR Cluj'

Note: I tried to interchange the ‘Rk’ and ‘Squad’ columns, but I could not get any result, just different errors.

What I am looking for is getting the rank of every home team / visitor team from my history of matches that are found in the final table (standings) and store them in ‘home_rank’ / ‘visitor_rank’ variables.

PS: I tried other ideas to access the rank but none of them got me any result.

Any ideas or solutions are great! Thank you 🙂

Answer

The KeyError reflects, that you try to index your dataframe standings with a row value instead of a column name. You might try to access the squads rank home_rank (and similarly for visitor_rank) with

home_rank  = standings['Rk'][ standings['Squad']=='CFR Cluj' ][0]
#home_rank = standings['Rk'].loc[ standings['Squad']=='CFR Cluj' ][0]

Step by step this is equal to

boolean_indices = standings['Squad']=='CFR Cluj'
standings_ranks = standings['Rk']
home_ranks      = standings_ranks[boolean_indices] 
home_rank       = home_ranks[0]  #if unique it only contains a single value