Can anyone recommend a fast way to sort the contents of a text file, based on the first X amount of characters of each line? For example if i have in the text file the following text
Adrian Graham some more text here John Adams some more text here
Then another record needs to be inserted for eg.
Bob Something some more text here
I need to keep the file sorted but this is a rather big file and i’d rather not load it entirely into memory at once. By big i mean about 500 000 lines, so perhaps not terribly huge.
I’ve had a search around and found http://www.codeodor.com/index.cfm/2007/5/14/Re-Sorting-really-BIG-files—the-Java-source-code/1208 and i wanted to know if anyone could suggest any other ways? For the sake of having second opinions?
My initial idea before i read the above linked article was:
Read the file
Split it into several files, for eg A to Z
If a line begins with “a” then it is written to the file called A.txt
Each of the files then have their contents sorted (no clear idea how just yet apart from alphabetical order)
Then when it comes to reading data, i know that if i want to find a line which starts with A then i open A.txt When inserting a new line the same thing applies and i just append to the end of the file. Later after the insert when there is time i can invoke my sorting program to reorder the files that have had stuff appended to them.
I realise that there are a few flaws in this like for eg. there won’t be an even number of lines that start with a particular letter so some files may be bigger than others etc.
Which again is why i need a second opinion for suggestions on how to approach this? The current program is in java but any programming language could be used for an example that would achieve this…I’ll port what i need to.
(If anyone’s wondering i’m not deliberately trying to give myself a headache by storing info this way, i inherited a painful little program which stores data to files instead of using some kind of database) Thanks in advance
You may also want to simply call the DOS “sort” command to sort the file. It is quick and will require next to no programming on your part.
In a DOS box, type
help sort|more for the sort syntax and options.