-
Notifications
You must be signed in to change notification settings - Fork 0
Word break #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: arai60
Are you sure you want to change the base?
Word break #23
Changes from all commits
aa3217c
e9a963c
d5663d6
1ba9ee6
c6ed2f8
1c6a7c4
975d2d8
c60e2e5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| # 配列を使ったdp | ||
| class Solution: | ||
| def wordBreak(self, s: str, wordDict: List[str]) -> bool: | ||
| is_word_break = [False for _ in range(len(s))] | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is_word_break は動詞の原形から始まっているため、関数名に見えます。 breakable または tokenizable あたりが良いと思います。個人的には、複数の値を持つ変数の名前には複数形の s を付けるのですが、これは好みの問題かもしれません。
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ありがとうございます。確かに関数名に見えますね |
||
| # is_word_break means s[:i+1] can be broken. | ||
| for i, c in enumerate(s): | ||
| string_from_beginning = s[:i+1] | ||
| for word in wordDict: | ||
| if string_from_beginning == word: | ||
| is_word_break[i] = True | ||
| elif i+1-len(word) >= 0 and s[i+1-len(word):i+1] == word and is_word_break[i-len(word)]: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 二項演算子の両側にスペースを空けることをお勧めいたします。ただし、スライスの中の式の二項演算子の両側は、両側にスペースを空けないほうが良いと思います。 https://peps.python.org/pep-0008/#other-recommendations
https://google.github.io/styleguide/pyguide.html#s3.6-whitespace
|
||
| is_word_break[i] = True | ||
| return is_word_break[len(s)-1] | ||
|
|
||
| # 再起的なdp | ||
| class Solution: | ||
| def wordBreak(self, s: str, wordDict: List[str]) -> bool: | ||
| is_word_break = [False for _ in range(len(s))] | ||
| seen = [False for _ in range(len(s))] | ||
| def recursiveWordBreak(index, seen, is_word_break): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 関数名は原則動詞の原形から始まることが多いと思います。 break_word_recursively() でしょうか。また、 recursively はソースコードを読めば分かるため、省略して、 break_word() 良いと思います。 |
||
| if index >= len(s): | ||
| return | ||
| if seen[index]: | ||
| return | ||
| string_from_beginning = s[:index+1] | ||
| for word in wordDict: | ||
| if string_from_beginning == word: | ||
| seen[index] = True | ||
| is_word_break[index] = True | ||
| return recursiveWordBreak(index+1, seen, is_word_break) | ||
| elif index+1-len(word) >= 0 and word == s[index+1-len(word):index+1] and is_word_break[index-len(word)]: | ||
| seen[index] = True | ||
| is_word_break[index] = True | ||
| return recursiveWordBreak(index+1, seen, is_word_break) | ||
| seen[index] = True | ||
| return recursiveWordBreak(index+1, seen, is_word_break) | ||
| recursiveWordBreak(0, seen, is_word_break) | ||
| return is_word_break[len(s)-1] | ||
|
|
||
| # タイムアウトしたコード, 本質的にはメモしてない再帰のように考える探索木が爆発してしまったと考えられる。 | ||
| class Solution: | ||
| def wordBreak(self, s: str, wordDict: List[str]) -> bool: | ||
| char_to_string = defaultdict(list) | ||
| for string in wordDict: | ||
| char_to_string[string[0]].append(string) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. この高速化はあんまり効果がない気がします。 startswith 使って start を指定すればスライスを取らなくなるので十分でしょう。 |
||
| stack = [0] | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. seen = set() とでもすればいいでしょう。 ああ、下の方で bool 配列使ってますね。私は set のほうが素直で好きですが、趣味の範囲でしょう。 |
||
| while stack: | ||
| cursor = stack.pop() | ||
| if cursor >= len(s): | ||
| return True | ||
|
|
||
| beginning_char = s[cursor] | ||
| for string in char_to_string[beginning_char]: | ||
| if s[cursor:cursor+len(string)] in char_to_string[beginning_char]: | ||
| stack.append(cursor+len(string)) | ||
| return False | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| """ | ||
| Reference | ||
| shining-aiさん: https://github.com/shining-ai/leetcode/pull/39/files | ||
| 自分も書いていてs[index+1-len(word):index+1]は時間計算量的にどうなのか気になった。ローリングハッシュで書き換えることを検討する。今回の問題はtop-downの方が書きやすかった...? | ||
| hayashi-ayさん: https://github.com/hayashi-ay/leetcode/pull/61/files | ||
| Exzrgさん: https://github.com/Exzrgs/LeetCode/pull/10/files startwithという方法もpythonにあることを学んだ。 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 確かにそうですね... |
||
|
|
||
| 結局最速はwordDict, sのローリングハッシュを計算してstoreしておくことになりそう。 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. いや、これローリングハッシュで書いてきたら、信用できない感じがするので、プロダクションコードに入れないでくれという気持ちになりそうです。 re2 だったらいいんじゃないでしょうか。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm |
||
| """ | ||
|
|
||
| # @cacheで覚えておいて再帰 参考: shining-aiさん | ||
| class Solution: | ||
| def wordBreak(self, s: str, wordDict: List[str]) -> bool: | ||
| @cache | ||
| def is_segmented(start): | ||
| if start == len(s): | ||
| return True | ||
| for word in wordDict: | ||
| if s[start:start+len(word)] != word: | ||
| continue | ||
| elif is_segmented(start+len(word)): | ||
| return True | ||
| return False | ||
| return is_segmented(0) | ||
|
|
||
| # @cacheを使わずにメモ化 | ||
| class Solution: | ||
| def wordBreak(self, s: str, wordDict: List[str]) -> bool: | ||
| seen = [False for _ in range(len(s))] | ||
| is_tokenized = [False for _ in range(len(s))] | ||
| def check_tokenizable(start, seen, is_tokenized): | ||
| if start == len(s): | ||
| return | ||
| if seen[start]: | ||
| return | ||
| seen[start] = True | ||
| for word in wordDict: | ||
| if s.startswith(word, start): | ||
| is_tokenized[start+len(word)-1] = True | ||
| check_tokenizable(start+len(word), seen, is_tokenized) | ||
| return | ||
| check_tokenizable(0, seen, is_tokenized) | ||
| return is_tokenized[-1] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| class Solution: | ||
| def wordBreak(self, s: str, wordDict: List[str]) -> bool: | ||
| seen = [False for _ in range(len(s))] | ||
| is_tokanizable = [False for _ in range(len(s))] | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 本質的ではないですが、
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. tokenizable[形]があるかと思いましたがどうやらなかったようです。can_tokenizeにいたします There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. というよりは、tok
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. あ, 本当だ。名前typoしてる
Comment on lines
+3
to
+4
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| def check_tokanizable(start, seen, is_tokanizable): | ||
| if start >= len(s): | ||
| return | ||
| if seen[start]: | ||
| return | ||
| seen[start] = True | ||
| for word in wordDict: | ||
| if s.startswith(word, start): | ||
| is_tokanizable[start+len(word)-1] = True | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. この部分ですでにTrueかどうかの判定を入れることで、seenを使わずに重複してチェックしないことを実現できそうです。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. これ嘘でした。調べた結果がFalseの場合がありますね。 |
||
| check_tokanizable(start+len(word), seen, is_tokanizable) | ||
| return | ||
| check_tokanizable(0, seen, is_tokanizable) | ||
| return is_tokanizable[-1] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| class Solution: | ||
| def wordBreak(self, s: str, wordDict: List[str]) -> bool: | ||
| seen = [False] * len(s) | ||
| is_tokenizable = [False] * len(s) | ||
| def check_tokenizable(start): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 個人的には、@cache使ったStep2のほうがわかりやすいかなと思いました。 |
||
| if start >= len(s): | ||
| return | ||
| if seen[start]: | ||
| return | ||
| seen[start] = True | ||
| for word in wordDict: | ||
| if s.startswith(word, start): | ||
| is_tokenizable[start+len(word)-1] = True | ||
| check_tokenizable(start+len(word)) | ||
| return | ||
| check_tokenizable(0) | ||
| return is_tokenizable[-1] | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[False] * len(s)のほうが書くの楽なのではないかなと思いました。(好みの問題かもしれないです)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
実行時間の方もそちらの方が早いので, mutableなものを要素に持つもの以外はこんどからそちらで書きますhttps://qiita.com/Krypf/items/5efb681d06e1ebd2abab