Time saving Python search functions

Spread the word!

Motivation

Some simple tasks may take longer than necessary. It happened to me for custom Python search tasks. It’s about extracting some strings from a given text file. There are plenty of solutions and implementation to solve such problems. In this post I want to share ready-to-use Python functions accomplishing specific search-related tasks. If you think they can somehow be improved, you are welcome to post a comment!

Libraries

The only library you need is “re”, which is used by Cropper, and re.split

Code snippets

Cropper

Task: with a single function call, you can sequentially crop the string you pass as input parameter, by giving (one or more) start and end delimiters. One distinguishes two cases:

  • If you pass start AND end delimiters it will return all occurrences of string between them. Note that after each iteration the previous start and end delimiters are not considered again. This means, for example if you want to search for all strings between “|” and “|”, if you have ” | A | B | C | D “, the result will be [‘ A ‘,’ C ‘]
  • If you pass just one delimiter (either start or end), it split the text (respectively either after or before) its first occurrence.

Example:

This code will crop our string three times in this sequence:

  1. Take all strings between “From1” and “To1”, join them as single string using “;;” as delimiter (you can change it if you dislike it…) and pass it to the next iteration.
  2. Take the previous generated string, extract everything after “From2” and pass it to the next iteration.
  3. Take the previous generated string, extract everything before “To3” and return it

re.split

Task: you need to split a string using more delimiters.

Example:

Output:

The character “|” means “or” in this context, but if you want it to be a delimiter just escape it with a backslash (i.e. \|  )

FilterOut

Task: you have a list (“myList”) of strings and want to create a new list of element containing (exclude = False) or NOT containing (exclude = True) a given substring (“lookup”).

ExtractLinesOnce and ExtractLinesMulti

Task: you want to iterate through a file text, sequentially search for different terms (given as list of strings “lookups”) and return the lines where they are found. It means, when the first term is found, store its line, go on and search for the next one. It’s usefull if you can define some intermediate “steps” before reaching the final line(s) you want to get. The difference between “ExtractLinesOnces” and “ExtractLinesMulti” is that the first one stops after reaching the last search term of the sequence, the latter continue and repeat the search sequence till the end of the file or till the specified iterations (“howMany”) through the sequence is reached.

You can also skip a number of lines (using the “skip” parameter).

Sidenote

I didn’t include any backward search (bottom-up) because the solutions I tried making use of the reversed  function lowered the performance. Much better was to search through the whole file and take the last results using negative indices (e.g. result[-10:]  takes last ten results).

Be the first to comment

Leave a Reply

Your email address will not be published.


*