The question is published on by Tutorial Guruji team.
I have data twitter in a CSV file (that I’m mining with a Python API). I get around 1000 lines of data. Now I want to shorten the tweet data using the specific Indonesian words “macet” or “kecelakaan” (in English “traffic” or “accident”) and put the matching rows into a new separate CSV file, just like in Excel using
The sample data twitter is example1.csv and the new file which will be created after the search of the word “macet” or “kecelakaan” is example2.csv. But there is no result.
import re import csv with open('example1.csv', 'r') as csvFile: reader = csv.reader(csvFile) if re.search(r'macet', reader): for row in reader: myData = list(row) print(row) newFile = open('example2.csv', 'w') with newFile: writer = csv.writer(newFile) writer.writerows(myData) print("Writing complete")
I use spyder for environment Python 3.6.
The CSV file is already in the same folder with Spyder. Here is the screen capture image of my CSV twitter data
updated : Sample of csv file. OS using : Windows
There are a couple of problems with your code.
In your reading loop you are passing a
csv.reader object to
re.search, but it doesn’t know how to search that object. You need to pass it text or byte strings.
myData = list(row)
row into a new list and saves it to
myData, but it’s already a list, so no conversion is necessary. And that line replaces the previous contents of
myData, but you actually want to save all the matching rows. However, there’s no need to save the rows, you can just write them to the new file as you go.
Anyway, here’s a repaired version of your code. From the screen shot it looks like you only want to search the text in column 2 of the input data (which corresponds to column C in your spreadsheet). I’ve created a regex that searches for the whole words “macet” and “kecelakaan”, the “b” matches at word boundaries so we don’t get a match if “macet” or “kecelakaan” is part of a larger word.
import re import csv # Make a case-insensitive regex to match the words "macet" or "kecelakaan" pattern = re.compile(r'bmacetb|bkecelakaanb', re.I) with open('example1.csv', 'r', newline='') as csvFile, open('example2.csv', 'w', newline='') as newFile: reader = csv.reader(csvFile) writer = csv.writer(newFile) for row in reader: # Skip empty rows if not row: continue if pattern.search(row): print(row) writer.writerow(row) print("Writing complete")
I’ve just made a couple of improvements to that code. It now uses the
newline='' arg to open the CSV files, and it skips any empty lines in the input CSV. And the regex now ignores the case when looking for matching words.