Sort a pandas dataframe by 2 columns (one with integers, one with alphanumerics) with priority for integer column

I have a dataframe with two columns.

One column has integers: 1,2,3, …. 10,11,12, …

One column has alphanumerics (as strings): A1, A2, … A10, A11, … B1, B2, … B10, B11 … (always one letter + one or two numbers, letters A-H, numbers 1-12)

Goal is to sort the dataframe by the integer column ascending with first priority, and by the alphanumerical column with second priority. The alphanumerical column should be sorted like this: A1, B1, C1, D1, … A2, B2, C2, D2, …, A3, B3, C3, D3, …

(this is a problem from biology/wetlab/robotics/automation/96-well plates).

Any ideas?

Answer

You can do it this way:

  1. Split the second column with alphanumeric strings into 2 columns: one column Letter to hold the first letter and another column Number to hold a number of one or two digits.
  2. Convert Number column from string to integer.
  3. Then, sort these 2 new columns together with the first column of integers

Let’s illustrate the process with an example below:

Assume we have the dataframe df as follows:

print(df)

   Col1 Col2
0     2  B12
1    11   C2
2     2   A1
3    11   B2
4     2   B1
5    11  C12
6     2  A12
7    11   C1
8     2   A2

Step 1 & 2: Split Col2 into 2 columns Letter & Number + Convert Number column from string to integer:

df['Letter'] = df['Col2'].str[0]               # take 1st char
df['Number'] = df['Col2'].str[1:].astype(int)  # take 2nd char onwards and convert to integer 

Result:

print(df)

   Col1 Col2 Letter  Number
0     2  B12      B      12
1    11   C2      C       2
2     2   A1      A       1
3    11   B2      B       2
4     2   B1      B       1
5    11  C12      C      12
6     2  A12      A      12
7    11   C1      C       1
8     2   A2      A       2

Step 3: Sort Col1, Letter and Number with priority: Col1 —> Number —> Letter:

df = df.sort_values(by=['Col1', 'Number', 'Letter'])

Result:

print(df)

   Col1 Col2 Letter  Number
2     2   A1      A       1
4     2   B1      B       1
8     2   A2      A       2
6     2  A12      A      12
0     2  B12      B      12
7    11   C1      C       1
3    11   B2      B       2
1    11   C2      C       2
5    11  C12      C      12

After sorting, you can remove the Letter and Number columns, as follows:

df = df.drop(['Letter', 'Number'], axis=1)

If you want to do all in one step, you can also chain the instructions, as follows:

df = (df.assign(Letter=df['Col2'].str[0], 
                Number=df['Col2'].str[1:].astype(int))
        .sort_values(by=['Col1', 'Number', 'Letter'])
        .drop(['Letter', 'Number'], axis=1)
     )

Result:

print(df)

   Col1 Col2
2     2   A1
4     2   B1
8     2   A2
6     2  A12
0     2  B12
7    11   C1
3    11   B2
1    11   C2
5    11  C12