How would I find words of specific length, ignoring punctuation

I have a program that reads an array into a list, counts all words, finds specific words, and finds words of specific length. Specifically in the bigLongWords method, how can I find words of x length or more, but disregard punctuation? For example if I had the string “The Rolling Stones!” and bigLongWords(7), I want it to find just “Rolling”, but it’s also including “stones!” in that.

public class JavaCharacterisLetterExample1 {  
public static void main(String[] args) {  
// Create three char primitives ch1, ch2 and ch3.  
  char ch1, ch2, ch3;  
  // Assign the values to ch1, ch2 and ch3.  
  ch1 = 'A';  
  ch2 = '9';  
  ch3 = 'e';  
  // Create three boolean primitives b1, b2 and b3;  
  boolean b1, b2, b3;  
  // Check whether ch1, ch2 and ch3 are letters or not and assign the results to b1, b2 and b3.  
  b1 = Character.isLetter(ch1);  
  b2 = Character.isLetter(ch2);  
  b3 = Character.isLetter(ch3);  

  String str1 = "The character "+ch1 + " is a letter: " + b1;  
  String str2 = "The character "+ch2 + " is a letter: " + b2;  
  String str3 = "The character "+ch3 + " is a letter: " + b3;  

  // Print the values of b1, b2 and b3.  
  System.out.println( str1 );  
  System.out.println( str2 );  
   System.out.println( str3 );  

}

Answer

You can do it in one of two ways:

  1. While reading from array, strip punctuations from words.
  2. While responding to a query like “find all words with length 7 or greater”, you can strip punctuations from words and return the filtered list.

The advantage of option 1 is that the work is done before read time, and read is super fast. The disadvantage of option 1 is that you’re doing it for all words. If there’s never a query for words of length 3 or more, you still incur the time for preprocessing those.

The advantage of option 2 is that you filter dynamically, so you pay the price only when there’s a query for that word length. The disadvantage of option 2 is exactly the advantage of option 2. If you get the same query 10 times, you do the same work 10 times.

You may go with a hybrid approach where you filter dynamically, but then update your source data asynchronously, so next time, you’re not doing the same filtering again.