Skip to content

Conversation

@abeitz
Copy link

@abeitz abeitz commented Feb 9, 2021

Pull request to merge attune integration into the master branch

Copy link
Member

@meson800 meson800 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments:
This is fairly clean code! 🙌

I've noted some stylistic things in comments, prepended with PEP8. Python has an official style guide called PEP8. There's also software that will automatically check style things in addition to common errors, such as pylint (pip install pylint, then you can run pylint attune_log_parser.py to see problems).
The rest is specific point suggestions.

Onto the big picture stuff!

1: I would abstract what you have written so far, and write a function that calls your regexes on single log lines, as they get delivered. The function signature should be something like:
def handle_log_line(last_state, line)
It should take some variable (probably a dictionary) that represents that last known Attune state (e.g. who is logged in, what protocol the attune is currently running, if any. Last known error status, etc), and a new line that it has read. Based on that line, it should create a new state variable by copying last_state and updating the relevant entries, and return this updated state.

After that we can move onto:
2: we need to be able to load log files as they get created, and we also need to load lines as they get generated. The first isn't something to worry about now, but the second is the next logical step.

Currently, your code opens up the file and reads the entire thing, determining state changes as it goes. A better way to do it on the actual Attune is read lines as they get generated out of the log file. Once you've refactored into a handle_log_line function that takes a single line and does a state update, we just need to figure out how to get updates on new lines as they get written.

How we will do this is using (slightly) lower-level file based control where we specifically control the "bookmark". When you open a file, you can imagine there being a bookmark that starts at the beginning of the file. When you call various read functions (readline, read, etc), you advance this little bookmark forward. The various read functions will give you strings back that they read, until they reach the end of file (EOF), at which point they return None.

We use that to our advantage by specifically reading lines, until we reach the end of the file. If we reached the end of the file, we save the bookmark position right before EOF, then wait around for a second and check again. If the Attune software has written a new line, then magic! We'll read another new line to pass off to handle_log_line. If not, then we'll just hit EOF again.

Here's a code snippet:

import time

with open('test.log') as logfile:
    while True: # Infinite loop for now
        bookmark = logfile.tell()
        read_line = logfile.readline().strip()
        # The .strip() on the previous line removes the newline (\n)
        # that is at the end of the string
        if not read_line: # e.g. 'if we hit EOF'
            time.sleep(1) # Wait a second...
            logfile.seek(bookmark) # then back up to the bookmark
        else:
            # We got a line!
            # handle_log_line(read_line)
            print(read_line)

Finally, a short note. If you are making a script that is meant to be run, it is good practice to put all of the code that should run on load inside the following if statement:

if __name__ == '__main__':
    print('Hello world!')

This ensures that if someone does import attune_log_parser (which is valid! You can make a Python module just by putting code in a .py), it doesn't run the code here. Placing it in the name/main check means that your code only executes when someone runs your script (e.g. python attune_log_parser.py or ./attune_log_parser.py on Linux).

log_file = 'System_2020-11-10_18-52-23.log'
#Define regexes

User_dictionary = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: use snake_case, e.g. no uppercase U. This is true for everything except type definitions (e.g. if this was a class).

event_start = re.compile('^Event:Starting_(?P<event>.+)$')
user_login = re.compile('login')
bubble_error = re.compile('^Data:New Bubble Detected')
big_bubble = re.compile('BUBBLE SIZE GREATER THAN THRESHOLD!!!')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to your code, I really love that this is the actual error message. Definitely feels like a frustrated hardware engineer at some point. The triple exclamation point really sells it.

Comment on lines +15 to +17
event_done = re.compile('^Event:(?P<event>[^_]+)_Done')
function_in_process = re.compile('^Instrument function (?P<state>.+) executing line (?P<line>\d+)$')
event_start = re.compile('^Event:Starting_(?P<event>.+)$')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it affects any of your regexes here, but prefer 'raw strings', e.g. by prefixing the string with an r (r'^Event blah blah' instead of '^Event blah blah). Making it a raw string ensures that your regex doesn't get modified during parsing.

If you're using Regex101 for your regex debugging, it actually hints at this; if you have it in Python regex mode, the top-string will have a little r denoting a raw string.

Normal strings expand escape sequences by default, e.g. they expand \n into a newline, \t into a tab character, and so on. Similarly, format strings in Python recognize curly braces in a special way. These types of characters may be in regexes, so by making it a raw string, we disable any of this expansion, meaning that the regex stays as desired.

Comment on lines +26 to +35
csv_log = csv.DictReader(log, delimiter=',')
csv_log.fieldnames = [
'TimeStamp',
"LogType",
"User",
"Category",
"Message",
"unclearNumberString" ]

for line in csv_log:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to change how this parsing works later, but this is good code!

Comment on lines +29 to +33
"LogType",
"User",
"Category",
"Message",
"unclearNumberString" ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: prefer single-quote strings ('hello world') over double-quote strings ("hello world"). Really, just makes sure you are being consistent instead of switching between the two.



result=event_start.match(line["Message"])
if result != None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a variety of reasons, prefer if result is not None instead of if result != None and if result is None instead of if result == None.

See https://stackoverflow.com/a/2209781 for details.

result=event_start.match(line["Message"])
if result != None:
function = result.group("event")
State = "Active"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: lowercase S in state.

Also, this is a very Matlab-y way to initialize variables. It is fine that you are creating State inside the for loop and it does also work in Python, but it's not generally good practice. What happens if all of the if statements fail while parsing the entire file? Then the final print line will fail because State is undefined! It's good practice when doing something like this to explicitly declare state = None or some other sentinel, then check if it is not None at the end to handle this case.

Comment on lines +12 to +13
'test' : 'test1'
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On how to load this user dictionary, the normal recommendation is to load 'secret' or external data either through environment variables or external files. I would tend to prefer external files for something like this. I would use a JSON file for this.

JSON data is very similar to the Python syntax, with some specific differences (like only double-quoted strings). After formatting your user dictionary as a separate JSON file (you can use websites like https://www.jsonformatter.io/)

{
   "some_long_uid": "cjohnsto",
   "some_other_uid": "abeitz"
}

then you can load this dictionary with:

import json

with open('test.json') as json_file:
  user_dictionary = json.load(json_file)

This also clearly separates the code from data. We can also just never commit the json file to source control, so it never appears publicly. This approach is currently being used on the server-side of labbot.

Base automatically changed from master to main March 1, 2021 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants