Check if string can be splitted into sentence using words in provided list

I’ve recently stumbled upon coding task, and I’ve struggled to get it right. It goes like this:

Given a non-empty string s and a list word_list containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words. You may assume the word_list does not contain duplicates, but each word can be used more than once.

For example, given:

s = 'whataniceday'
word_list = ['a', 'what', 'an', 'nice', 'day']

Return True, because 'whataniceday' can be segmented as 'what a nice day'.

I came up with a pretty naive solution, that works for this particular example, but it is not hard to make it fail, for example by adding a word to word_list that other word in the list starts with (i.e. ['a', 'wha', 'what', 'an', 'nice', 'day']). There are plenty of other things that can mess up my solution, but anyway here goes:

s = "whataniceday"
word_list = ["h", "a", "what", "an", "nice", "day"]

def can_be_segmented(s, word_list):
    tested_str = s
    buildup_str = ''

    for letter in tested_str:        
        buildup_str += letter

        if buildup_str not in word_list:

        tested_str = tested_str[len(buildup_str):]
        buildup_str = ''

    return bool(tested_str == '' and buildup_str == '')

print(can_be_segmented(s, word_list))

Do you guys have an idea of how to fix it? Or maybe there is a better approach to this problem?


This is my solution, using a generator expression for brevity, and recursion

s = "whataniceday"
word_list = ["h", "ani", "a", "what", "an", "nice", "day"]

def can_be_segmented(s, word_list):
    return s == "" or any(
        s.startswith(word) and can_be_segmented(s[len(word):], word_list)
        for word in word_list)

assert can_be_segmented(s, word_list)
assert not can_be_segmented("whataniannicday", word_list)

This code states that the string can be segmented if we can find a word so that the string begins by this word, and the rest of the string can itself be segmented.

Leave a Reply

Your email address will not be published. Required fields are marked *