I have a large string (an RSS Article to be more precise) and I want to get the word in a specific startIndex and endIndex. String provides the substring method, but only using ints as its parameters. My start and end indexes are of type long.
What is the best way to get the word from a String using start and end indexes of type long?
My first solution was to start trimming the String and get it down so I can use ints. Didn’t like where it was going. Then I looked at Apache Commons Lang but didn’t find anything. Any good solutions?
Just to provide a little more information.
I am using a tool called General Architecture for Text Engineering (GATE) which scans a String and returns a list of Annotations. An annotation holds a type of a word (Person, Location, etc) and the start and end indexes of that word .
For the RSS, I use ROME, which reads an RSS feed and contains the body of the article in a String.
There is no point doing this on a String because a String can hold at
2^31 - 1 characters. Internally the string’s characters are held in a char, and all of the API methods use
int as the type for lengths, positions and offsets.
- The same restriction applied to StringBuffer or StringBuilder; i.e. an
- A StringReader is backed by a String, so that won’t help.
- Both CharBuffer and ByteBuffer have the same restriction; i.e. an
- A bare array of a primitive type is limited to an
In short, you are going to have to implement your own “long string” type that internally holds its characters in (for example) an array of arrays of characters.
(I tried a Google search but I couldn’t spot an existing implementation of long strings that looked credible. I guess there’s not a lot of call for monstrously large strings in Java …)
By the way, if you anticipate that the strings are never going to be this large, you should just convert your
long offsets to
int. A cast would work, but you might want to check the range and throw an exception if you ever get an offset >=