I’ve recently stumbled upon coding task, and I’ve struggled to get it right. It goes like this:
Given a non-empty string
s and a list
word_list containing a list of non-empty words, determine if
s can be segmented into a space-separated sequence of one or more dictionary words. You may assume the
word_list does not contain duplicates, but each word can be used more than once.
For example, given:
s = 'whataniceday' word_list = ['a', 'what', 'an', 'nice', 'day']
'whataniceday' can be segmented as
'what a nice day'.
I came up with a pretty naive solution, that works for this particular example, but it is not hard to make it fail, for example by adding a word to
word_list that other word in the list starts with (i.e.
['a', 'wha', 'what', 'an', 'nice', 'day']). There are plenty of other things that can mess up my solution, but anyway here goes:
s = "whataniceday" word_list = ["h", "a", "what", "an", "nice", "day"] def can_be_segmented(s, word_list): tested_str = s buildup_str = '' for letter in tested_str: buildup_str += letter if buildup_str not in word_list: continue tested_str = tested_str[len(buildup_str):] buildup_str = '' return bool(tested_str == '' and buildup_str == '') print(can_be_segmented(s, word_list))
Do you guys have an idea of how to fix it? Or maybe there is a better approach to this problem?
This is my solution, using a generator expression for brevity, and recursion
s = "whataniceday" word_list = ["h", "ani", "a", "what", "an", "nice", "day"] def can_be_segmented(s, word_list): return s == "" or any( s.startswith(word) and can_be_segmented(s[len(word):], word_list) for word in word_list) assert can_be_segmented(s, word_list) assert not can_be_segmented("whataniannicday", word_list)
This code states that the string can be segmented if we can find a word so that the string begins by this word, and the rest of the string can itself be segmented.