python代码怎么删除,在编写python代码之后删除它们

正如我在一篇评论中所说的，在我看来，“bigfile”的大小不应该减缓计数增加的速度。当您在这样的文件上迭代时，Python只按顺序一次读取一行。在

此时可以进行的优化取决于matchedLines有多大，以及matchedLines字符串与您要查找的文本之间的关系。在

如果匹配线很大，则只需执行一次“查找”即可节省时间：for line in completedataset:

text = line[:line.find(',')]

for t in matchedLines:

if t in text:

line = line.strip().split(',')

smallerdataset.write(','.join(line[:3]) + '\n')

break

在我的测试中，“发现”大约花了300纳秒，所以如果匹配线有几百万个项目，那么你就有额外的一秒时间。在

如果您要查找精确匹配，而不是子字符串匹配，则可以使用集合来加快速度：

^{pr2}$

如果不匹配的目标文本看起来与匹配的文本完全不同(例如，大多数目标都是随机字符串，matchedLines是一组名称)，并且匹配的行都超过一定的长度，那么您可以通过检查子字符串来尝试变得非常聪明。假设所有匹配的行至少有5个字符长。。。在def subkeys(s):

## e.g. if len(s) is 7, return s[0:5], s[1:6], s[2:7].

return [s[i:i+5] for i in range(len(s) + 1 - 5)]

existing_subkeys = set()

for line in matchedLines:

existing_subkeys.update(subkeys(line))

for line in completedataset:

target = line[:line.find(',')]

might_match = False

for subkey in subkeys(target):

if subkey in existing_subkeys:

might_match = True

break

if might_match:

# Then we have to do the old slow way.

for matchedLine in matchedLines:

if matchedLine in target:

# Do the split and write and so on.

但是做这样的事情很容易比自己聪明，这取决于你的数据是什么样子。在