I think this should be easy but yet have not been able to solve it. I have two files as below and I want to merge them in a way that lines starting with
> in the file1 to be the header of the lines in the file2
>seq12 ACGCTCGCA >seq34 GCATCGCGT >seq56 GCGATGCGC
ATCGCGCATGATCTCAG AGCGCGCATGCGCATCG AGCAAATTTAGCAACTC
so the desired output should be:
>seq12 ATCGCGCATGATCTCAG >seq34 AGCGCGCATGCGCATCG >seq56 AGCAAATTTAGCAACTC
I have tried this code so far but in output, all the lines coming from file2 are the same:
from Bio import SeqIO with open(file1) as fw: with open(file2,'r') as rv: for line in rv: items = line for record in SeqIO.parse(fw, 'fasta'): print('>' + record.id) print(line)
If you cannot store your files in memory, you need a solution that reads line by line from each file, and writes accordingly to the output file. The following program does that. The comments try to clarify, though I believe it is clear from the code.
with open("file1.txt") as first, open("file2.txt") as second, open("output.txt", "w+") as output: while 1: line_first = first.readline() # line from file1 (header) line_second = second.readline() # line from file2 (body) if not (line_first and line_second): # if any file has ended break # write to output file output.writelines([line_first, line_second]) # jump one line from file1 first.readline()
Note that this will only work if
file1.txt has the specific format you presented (odd lines are headers, even lines are useless).
In order to allow a bit more customization, you can wrap it up in a function as:
def merge_files(header_file_path, body_file_path, output_file="output.txt", every_n_lines=2): with open(header_file_path) as first, open(body_file_path) as second, open(output_file, "w+") as output: while 1: line_first = first.readline() # line from header line_second = second.readline() # line from body if not (line_first and line_second): # if any file has ended break # write to output file output.writelines([line_first, line_second]) # jump n lines from header for _ in range(every_n_lines - 1): first.readline()
And then calling
merge_files("file1.txt", "file2.txt") should do the trick.