Word break #23

SuperHotDogCat · 2024-05-28T14:27:05Z

問題

Exzrgs · 2024-05-29T03:53:42Z

arai60/word_break/phase1.py

+# 配列を使ったdp
+class Solution:
+    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
+        is_word_break = [False for _ in range(len(s))] 


[False] * len(s)のほうが書くの楽なのではないかなと思いました。(好みの問題かもしれないです)

実行時間の方もそちらの方が早いので, mutableなものを要素に持つもの以外はこんどからそちらで書きますhttps://qiita.com/Krypf/items/5efb681d06e1ebd2abab

Exzrgs · 2024-05-29T04:00:38Z

arai60/word_break/phase3.py

+class Solution:
+    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
+        seen = [False for _ in range(len(s))]
+        is_tokanizable = [False for _ in range(len(s))]


本質的ではないですが、tokenizeをtypoしていると思います。
個人的には、can_tokenizeのほうが短くて良いのかなと思いました

tokenizable[形]があるかと思いましたがどうやらなかったようです。can_tokenizeにいたします

というよりは、tokanizableになってしまっているという指摘でした。
tokenizableがないのは驚きですね...。とはいえ個人的には、造語として使っちゃってもよいのではないかと思ってます(プログラミングでは英語の文法を無視した命名がされることがあり、Goの標準ライブラリとかでもちらほら見かけます)

あ, 本当だ。名前typoしてる

Exzrgs · 2024-05-29T04:04:19Z

arai60/word_break/phase3.py

+        seen = [False for _ in range(len(s))]
+        is_tokanizable = [False for _ in range(len(s))]


seenやis_tokanizableを関数の引数に持ってくるのであれば、宣言は関数を使う直前のほうが良いのではないかと思いました。
もしくは、関数の引数にせずにアクセスするというのも手だと思います。

Exzrgs · 2024-05-29T04:21:50Z

arai60/word_break/phase3.py

+            seen[start] = True
+            for word in wordDict:
+                if s.startswith(word, start):
+                    is_tokanizable[start+len(word)-1] = True


この部分ですでにTrueかどうかの判定を入れることで、seenを使わずに重複してチェックしないことを実現できそうです。
if s.startswith(word, start) and not is_tokanizable[start+len(word)-1]:みたいな感じですね

これ嘘でした。調べた結果がFalseの場合がありますね。
初期値をNoneにして最後にif not True: Falseとかの処理を入れたら実現できますね。
とはいえこっちのほうが良いのか？というのはわからないです...

SuperHotDogCat · 2024-05-29T13:12:57Z

seen = [False] * len(s)のようにseenを初期化, check_tokenizableの引数を一つに

oda · 2024-05-29T17:06:57Z

arai60/word_break/phase1.py

+    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
+        char_to_string = defaultdict(list)
+        for string in wordDict:
+            char_to_string[string[0]].append(string)


この高速化はあんまり効果がない気がします。

startswith 使って start を指定すればスライスを取らなくなるので十分でしょう。
https://docs.python.org/3/library/stdtypes.html#str.startswith

oda · 2024-05-29T17:10:53Z

arai60/word_break/phase1.py

+        char_to_string = defaultdict(list)
+        for string in wordDict:
+            char_to_string[string[0]].append(string)
+        stack = [0]


seen = set()
として、

if cursor in seen: continue seen.add(cursor)

とでもすればいいでしょう。

ああ、下の方で bool 配列使ってますね。私は set のほうが素直で好きですが、趣味の範囲でしょう。

oda · 2024-05-29T17:31:00Z

arai60/word_break/phase2.py

+hayashi-ayさん: https://github.com/hayashi-ay/leetcode/pull/61/files
+Exzrgさん: https://github.com/Exzrgs/LeetCode/pull/10/files startwithという方法もpythonにあることを学んだ。
+
+結局最速はwordDict, sのローリングハッシュを計算してstoreしておくことになりそう。


いや、これローリングハッシュで書いてきたら、信用できない感じがするので、プロダクションコードに入れないでくれという気持ちになりそうです。

re2 だったらいいんじゃないでしょうか。

https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm
Aho–Corasick algorithm
はありますが、文字列アルゴリズムは大変なものが多いです。
fhiyo さんが最近書いていました。
fhiyo/leetcode@ed3ace0

TORUS0818 · 2024-12-13T07:45:41Z

arai60/word_break/phase2.py

+shining-aiさん: https://github.com/shining-ai/leetcode/pull/39/files 
+自分も書いていてs[index+1-len(word):index+1]は時間計算量的にどうなのか気になった。ローリングハッシュで書き換えることを検討する。今回の問題はtop-downの方が書きやすかった...?
+hayashi-ayさん: https://github.com/hayashi-ay/leetcode/pull/61/files
+Exzrgさん: https://github.com/Exzrgs/LeetCode/pull/10/files startwithという方法もpythonにあることを学んだ。


startswithですね。
細かいんですが、割とtypoするので。
https://docs.python.org/ja/3/library/stdtypes.html#str.startswith

確かにそうですね...

nittoco · 2024-12-14T14:55:48Z

arai60/word_break/phase4.py

+    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
+        seen = [False] * len(s)
+        is_tokenizable = [False] * len(s)
+        def check_tokenizable(start):


個人的には、@cache使ったStep2のほうがわかりやすいかなと思いました。
check_tokenizable(start)という関数名だと、startというindexを入れたら、そのindexから始める文字列がtokenizableかどうかをbool値で返す関数を想像してしまいそうです。

nodchip · 2024-12-15T02:26:22Z

arai60/word_break/phase1.py

+            for word in wordDict:
+                if string_from_beginning == word:
+                    is_word_break[i] = True
+                elif i+1-len(word) >= 0 and s[i+1-len(word):i+1] == word and is_word_break[i-len(word)]:


二項演算子の両側にスペースを空けることをお勧めいたします。ただし、スライスの中の式の二項演算子の両側は、両側にスペースを空けないほうが良いと思います。

https://peps.python.org/pep-0008/#other-recommendations

Always surround these binary operators with a single space on either side: assignment (=), augmented assignment (+=, -= etc.), comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not), Booleans (and, or, not).

However, in a slice the colon acts like a binary operator, and should have equal amounts on either side (treating it as the operator with the lowest priority). In an extended slice, both colons must have the same amount of spacing applied. Exception: when a slice parameter is omitted, the space is omitted:

https://google.github.io/styleguide/pyguide.html#s3.6-whitespace

Surround binary operators with a single space on either side for assignment (=), comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not), and Booleans (and, or, not). Use your better judgment for the insertion of spaces around arithmetic operators (+, -, *, /, //, %, **, @).

nodchip · 2024-12-15T02:36:40Z

arai60/word_break/phase1.py

+    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
+        is_word_break = [False for _ in range(len(s))]
+        seen = [False for _ in range(len(s))]
+        def recursiveWordBreak(index, seen, is_word_break):


関数名は原則動詞の原形から始まることが多いと思います。 break_word_recursively() でしょうか。また、 recursively はソースコードを読めば分かるため、省略して、 break_word() 良いと思います。

nodchip · 2024-12-15T02:37:47Z

arai60/word_break/phase1.py

+# 配列を使ったdp
+class Solution:
+    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
+        is_word_break = [False for _ in range(len(s))] 


is_word_break は動詞の原形から始まっているため、関数名に見えます。 breakable または tokenizable あたりが良いと思います。個人的には、複数の値を持つ変数の名前には複数形の s を付けるのですが、これは好みの問題かもしれません。

ありがとうございます。確かに関数名に見えますね

SuperHotDogCat added 7 commits May 7, 2024 19:46

phase1

aa3217c

phase1

e9a963c

phase1

d5663d6

phase2

1ba9ee6

phase3

c6ed2f8

remove unneccesary files

1c6a7c4

ファイル末尾は改行で終える(自戒)

975d2d8

Exzrgs reviewed May 29, 2024

View reviewed changes

Modified

c60e2e5

oda reviewed May 29, 2024

View reviewed changes

TORUS0818 reviewed Dec 13, 2024

View reviewed changes

nittoco reviewed Dec 14, 2024

View reviewed changes

nodchip reviewed Dec 15, 2024

View reviewed changes

		seen = [False for _ in range(len(s))]
		is_tokanizable = [False for _ in range(len(s))]

Word break #23

Are you sure you want to change the base?

Word break #23

Uh oh!

Conversation

SuperHotDogCat commented May 28, 2024

問題

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SuperHotDogCat commented May 29, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants