# numpy 1D array: mask elements that repeat more than n times

Q: given an array of integers like

```[1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
```

I need to mask elements that repeat more than `N` times. The goal is to retrieve the boolean mask array.

I came up with a rather complicated solution:

```import numpy as np

bins = np.array([1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5])

N = 3
splits = np.split(bins, np.where(np.diff(bins) != 0)[0]+1)
for s in splits:
if s.shape[0] <= N:
else:

```

giving e.g.

```bins[mask]
Out[90]: array([1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5])
```

Is there a nicer way to do this?

Wrap-up: Here’s a slim version of MSeifert’s benchmark plot (thanks for pointing me to `simple_benchmark`). Showing the four most performant options:

The idea proposed by Florian H, modified by Paul Panzer seems to be a great way of solving this problem as it is pretty straight forward and `numpy`-only. If you’re fine with using `numba`, MSeifert’s solution outperforms the other.

I chose to accept MSeifert’s answer as solution as it is the more general answer: It correctly handles arbitrary arrays with (non-unique) blocks of consecutive repeating elements. In case `numba` is a no-go, Divakar’s answer is also worth a look.

I want to present a solution using numba which should be fairly easy to understand. I assume that you want to “mask” consecutive repeating items:

```import numpy as np
import numba as nb

@nb.njit

current = arr[0]
count = 0
for idx, item in enumerate(arr):
if item == current:
count += 1
else:
current = item
count = 1
```

For example:

```>>> bins = np.array([1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5])
array([1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5])
array([1, 1, 2, 2, 3, 3, 4, 4, 5, 5])
```

## Performance:

Using `simple_benchmark` – however I haven’t included all approaches. It’s a log-log scale:

It seems like the numba solution cannot beat the solution from Paul Panzer which seems to be faster for large arrays by a bit (and doesn’t require an additional dependency).

However both seem to outperform the other solutions, but they do return a mask instead of the “filtered” array.

```import numpy as np
import numba as nb
from simple_benchmark import BenchmarkBuilder, MultiArgument

b = BenchmarkBuilder()

bins = np.array([1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5])

@nb.njit

current = arr[0]
count = 0
for idx, item in enumerate(arr):
if item == current:
count += 1
else:
current = item
count = 1

def MSeifert(arr, n):

from scipy.ndimage.morphology import binary_dilation

def Divakar_1(a, N):
k = np.ones(N,dtype=bool)
m = np.r_[True,a[:-1]!=a[1:]]
return a[binary_dilation(m,k,origin=-(N//2))]

def Divakar_2(a, N):
k = np.ones(N,dtype=bool)
return a[binary_dilation(np.ediff1d(a,to_begin=a[0])!=0,k,origin=-(N//2))]

def Divakar_3(a, N):
m = np.r_[True,a[:-1]!=a[1:],True]
idx = np.flatnonzero(m)
c = np.diff(idx)
return np.repeat(a[idx[:-1]],np.minimum(c,N))

from skimage.util import view_as_windows

def Divakar_4(a, N):
m = np.r_[True,a[:-1]!=a[1:]]
w = view_as_windows(m,N)
idx = np.flatnonzero(m)
v = idx<len(w)
w[idx[v]] = 1
if v.all()==0:
m[idx[v.argmin()]:] = 1
return a[m]

def Divakar_5(a, N):
m = np.r_[True,a[:-1]!=a[1:]]
w = view_as_windows(m,N)
last_idx = len(a)-m[::-1].argmax()-1
w[m[:-N+1]] = 1
m[last_idx:last_idx+N] = 1
return a[m]

def PaulPanzer(a,N):

import random