# How to make undersampling to have 25% of imput fo category 0 and does not changes in category 1 in Python?

I have imbalanced dataset in Pytohn like: 95% of 0 and 5% of 1.

How can I make undersampling to reduce number of zeros to have only 25% of imput dataset ?

I ask you because on the internet source I see only undesampling codes which cause that my dataset is balanced 50% of 0 and 50% of 1 and I do not want to have that, I only want to reduce my number of zeroes to level of 25% in dataset

How can I do taht in Python ? Have you some example codes?

To apply different rules to different values, you can use `groupby`. As you didn’t give an example dataset I’m just using a dataframe with a column `col`, which has 19 zeros and 1 one:

```>>> df.shape
(20, 2)
>>> df['col'].value_counts() / len(df)
0      0.95
1      0.05
Name: col, dtype: float64
```

Now `groupby.sample()` doesn’t allow setting different numbers or fractions per group, so we can simply use `groupby.apply()` which itself can call `sample()` on the dataframes:

```>>> df.groupby('col').apply(lambda g: g.sample(frac=.25 if g.name == 0 else 1))
col foo
col
0   6     0   g
16    0   q
3     0   d
14    0   o
15    0   p
1   19    1   t
>>> df.groupby('col').apply(lambda g: g.sample(frac=.25 if g.name == 0 else 1))
col foo
col
0   16    0   q
5     0   f
13    0   n
2     0   c
9     0   j
1   19    1   t
```

Note that I’m using the fact that the value used to decide the group is passed inside `apply` by setting a `.name` property on the dataframe.

You can add `.droplevel('col')` at the end to remove the first index level.