I’ve recently stumbled upon coding task, and I’ve struggled to get it right. It goes like this:
Given a non-empty string s
and a list word_list
containing a list of non-empty words, determine if s
can be segmented into a space-separated sequence of one or more dictionary words. You may assume the word_list
does not contain duplicates, but each word can be used more than once.
For example, given:
s = 'whataniceday' word_list = ['a', 'what', 'an', 'nice', 'day']
Return True
, because 'whataniceday'
can be segmented as 'what a nice day'
.
I came up with a pretty naive solution, that works for this particular example, but it is not hard to make it fail, for example by adding a word to word_list
that other word in the list starts with (i.e. ['a', 'wha', 'what', 'an', 'nice', 'day']
). There are plenty of other things that can mess up my solution, but anyway here goes:
s = "whataniceday" word_list = ["h", "a", "what", "an", "nice", "day"] def can_be_segmented(s, word_list): tested_str = s buildup_str = '' for letter in tested_str: buildup_str += letter if buildup_str not in word_list: continue tested_str = tested_str[len(buildup_str):] buildup_str = '' return bool(tested_str == '' and buildup_str == '') print(can_be_segmented(s, word_list))
Do you guys have an idea of how to fix it? Or maybe there is a better approach to this problem?
Answer
This is my solution, using a generator expression for brevity, and recursion
s = "whataniceday" word_list = ["h", "ani", "a", "what", "an", "nice", "day"] def can_be_segmented(s, word_list): return s == "" or any( s.startswith(word) and can_be_segmented(s[len(word):], word_list) for word in word_list) assert can_be_segmented(s, word_list) assert not can_be_segmented("whataniannicday", word_list)
This code states that the string can be segmented if we can find a word so that the string begins by this word, and the rest of the string can itself be segmented.