Extract dataframe from dataframe based on column value with extended bounds

I have the following dataframe:

NAME SIGNAL
a 0
b 0
c 0
d 0
e 1
f 1
g 1
h 0
i 0
j 0
k 0
l 0
m 0
n 1
o 1
p 1
q 1
r 0
s 0
t 0

I need to write a function that will allow me to extract another dataframe, or just modify the existing frame based on a condition:

Get all columns (in my case NAME) if SIGNAL column is 1 for the row but also extract 2 rows extra from above and 2 rows extra from bellow.

In my example, the function should return me the following table:

NAME SIGNAL
c 0
d 0
e 1
f 1
g 1
h 0
i 0
j 0
l 0
m 0
n 1
o 1
p 1
q 1
r 0
s 0

Thanks!

UPDATE:

This is the code I have so far:

# Import pandas library
import pandas as pd
 
# initialize list of lists
data = [['a', 0], ['b', 0], ['c', 1], ['d', 1], ['e', 0], ['f', 0], ['g', 0], ['h', 1], ['i', 0], ['j', 0], ['k', 0]]
 
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['NAME', 'SIGNAL'])
 
# print dataframe.
print(df)


print("----------------")
      
for index, row in df.iterrows():
    #print(row['Name'], row['Age'])

    if((df.iloc[index]['SIGNAL'] == 1) & (df.iloc[index-1]['SIGNAL'] == 0)): #check when the signal change from 0 to 1
        print(df.iloc[index]['NAME']) #first line with signal 1 after it was 0

        #print the above 2 lines
        print(df.iloc[index-1]['NAME'])
        print(df.iloc[index-2]['NAME'])

My dataframe is like:

   NAME  SIGNAL
0     a       0
1     b       0
2     c       1
3     d       1
4     e       0
5     f       0
6     g       0
7     h       1
8     i       0
9     j       0
10    k       0

My code is returning:

c
b
a
h
g
f

The problem here is that I cannot return the value of “d” and “e” + “f” or “i” and “j” because i get the error “IndexError: single positional indexer is out-of-bounds” if i try if condition:

(df.iloc[index]['SIGNAL'] == 1) & (df.iloc[index+1]['SIGNAL'] == 0)

enter code here

Also the extended bounds will be variable, sometimes I will work with 2 extra rows from top and bottom sometimes with more.

I am looking for a solution using dataframes functions and not iteration.

thanks!

Answer

This will return the desired data frame:

df[(df.shift(periods=-2, axis="rows").SIGNAL == 1) | (df.shift(periods=-1, axis="rows").SIGNAL == 1) | (df.SIGNAL == 1) | (df.shift(periods=1, axis="rows").SIGNAL == 1) | (df.shift(periods=2, axis="rows").SIGNAL == 1)]

Output:

NAME SIGNAL
c 0
d 0
e 1
f 1
g 1
h 0
i 0
l 0
m 0
n 1
o 1
p 1
q 1
r 0
s 0

Add .NAME to the end to get your series of names

2     c
3     d
4     e
5     f
6     g
7     h
8     i
11    l
12    m
13    n
14    o
15    p
16    q
17    r
18    s
Name: NAME, dtype: object

Update: for arbitrarily large span

m=(df.shift(periods=-400, axis="rows").SIGNAL == 1)

for i in list(range(-399,401)):
    m= m | (df.shift(periods=i, axis="rows").SIGNAL == 1)
print(df[m])

Disclaimer:

This method may be inefficient for large spans