Metadata-Version: 2.1
Name: regex-log-parser
Version: 1.0.0
Summary: Python library to process hetrogenous log files by defining simple regex patterns and functions to handle them. Includes classes to parse data into postgres database.
Author-email: Harrison Barlow <harrison.barlow@curtin.edu>
License: Mozilla Public License Version 2.0
        ==================================
        
        1. Definitions
        --------------
        
        1.1. "Contributor"
            means each individual or legal entity that creates, contributes to
            the creation of, or owns Covered Software.
        
        1.2. "Contributor Version"
            means the combination of the Contributions of others (if any) used
            by a Contributor and that particular Contributor's Contribution.
        
        1.3. "Contribution"
            means Covered Software of a particular Contributor.
        
        1.4. "Covered Software"
            means Source Code Form to which the initial Contributor has attached
            the notice in Exhibit A, the Executable Form of such Source Code
            Form, and Modifications of such Source Code Form, in each case
            including portions thereof.
        
        1.5. "Incompatible With Secondary Licenses"
            means
        
            (a) that the initial Contributor has attached the notice described
                in Exhibit B to the Covered Software; or
        
            (b) that the Covered Software was made available under the terms of
                version 1.1 or earlier of the License, but not also under the
                terms of a Secondary License.
        
        1.6. "Executable Form"
            means any form of the work other than Source Code Form.
        
        1.7. "Larger Work"
            means a work that combines Covered Software with other material, in
            a separate file or files, that is not Covered Software.
        
        1.8. "License"
            means this document.
        
        1.9. "Licensable"
            means having the right to grant, to the maximum extent possible,
            whether at the time of the initial grant or subsequently, any and
            all of the rights conveyed by this License.
        
        1.10. "Modifications"
            means any of the following:
        
            (a) any file in Source Code Form that results from an addition to,
                deletion from, or modification of the contents of Covered
                Software; or
        
            (b) any new file in Source Code Form that contains any Covered
                Software.
        
        1.11. "Patent Claims" of a Contributor
            means any patent claim(s), including without limitation, method,
            process, and apparatus claims, in any patent Licensable by such
            Contributor that would be infringed, but for the grant of the
            License, by the making, using, selling, offering for sale, having
            made, import, or transfer of either its Contributions or its
            Contributor Version.
        
        1.12. "Secondary License"
            means either the GNU General Public License, Version 2.0, the GNU
            Lesser General Public License, Version 2.1, the GNU Affero General
            Public License, Version 3.0, or any later versions of those
            licenses.
        
        1.13. "Source Code Form"
            means the form of the work preferred for making modifications.
        
        1.14. "You" (or "Your")
            means an individual or a legal entity exercising rights under this
            License. For legal entities, "You" includes any entity that
            controls, is controlled by, or is under common control with You. For
            purposes of this definition, "control" means (a) the power, direct
            or indirect, to cause the direction or management of such entity,
            whether by contract or otherwise, or (b) ownership of more than
            fifty percent (50%) of the outstanding shares or beneficial
            ownership of such entity.
        
        2. License Grants and Conditions
        --------------------------------
        
        2.1. Grants
        
        Each Contributor hereby grants You a world-wide, royalty-free,
        non-exclusive license:
        
        (a) under intellectual property rights (other than patent or trademark)
            Licensable by such Contributor to use, reproduce, make available,
            modify, display, perform, distribute, and otherwise exploit its
            Contributions, either on an unmodified basis, with Modifications, or
            as part of a Larger Work; and
        
        (b) under Patent Claims of such Contributor to make, use, sell, offer
            for sale, have made, import, and otherwise transfer either its
            Contributions or its Contributor Version.
        
        2.2. Effective Date
        
        The licenses granted in Section 2.1 with respect to any Contribution
        become effective for each Contribution on the date the Contributor first
        distributes such Contribution.
        
        2.3. Limitations on Grant Scope
        
        The licenses granted in this Section 2 are the only rights granted under
        this License. No additional rights or licenses will be implied from the
        distribution or licensing of Covered Software under this License.
        Notwithstanding Section 2.1(b) above, no patent license is granted by a
        Contributor:
        
        (a) for any code that a Contributor has removed from Covered Software;
            or
        
        (b) for infringements caused by: (i) Your and any other third party's
            modifications of Covered Software, or (ii) the combination of its
            Contributions with other software (except as part of its Contributor
            Version); or
        
        (c) under Patent Claims infringed by Covered Software in the absence of
            its Contributions.
        
        This License does not grant any rights in the trademarks, service marks,
        or logos of any Contributor (except as may be necessary to comply with
        the notice requirements in Section 3.4).
        
        2.4. Subsequent Licenses
        
        No Contributor makes additional grants as a result of Your choice to
        distribute the Covered Software under a subsequent version of this
        License (see Section 10.2) or under the terms of a Secondary License (if
        permitted under the terms of Section 3.3).
        
        2.5. Representation
        
        Each Contributor represents that the Contributor believes its
        Contributions are its original creation(s) or it has sufficient rights
        to grant the rights to its Contributions conveyed by this License.
        
        2.6. Fair Use
        
        This License is not intended to limit any rights You have under
        applicable copyright doctrines of fair use, fair dealing, or other
        equivalents.
        
        2.7. Conditions
        
        Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted
        in Section 2.1.
        
        3. Responsibilities
        -------------------
        
        3.1. Distribution of Source Form
        
        All distribution of Covered Software in Source Code Form, including any
        Modifications that You create or to which You contribute, must be under
        the terms of this License. You must inform recipients that the Source
        Code Form of the Covered Software is governed by the terms of this
        License, and how they can obtain a copy of this License. You may not
        attempt to alter or restrict the recipients' rights in the Source Code
        Form.
        
        3.2. Distribution of Executable Form
        
        If You distribute Covered Software in Executable Form then:
        
        (a) such Covered Software must also be made available in Source Code
            Form, as described in Section 3.1, and You must inform recipients of
            the Executable Form how they can obtain a copy of such Source Code
            Form by reasonable means in a timely manner, at a charge no more
            than the cost of distribution to the recipient; and
        
        (b) You may distribute such Executable Form under the terms of this
            License, or sublicense it under different terms, provided that the
            license for the Executable Form does not attempt to limit or alter
            the recipients' rights in the Source Code Form under this License.
        
        3.3. Distribution of a Larger Work
        
        You may create and distribute a Larger Work under terms of Your choice,
        provided that You also comply with the requirements of this License for
        the Covered Software. If the Larger Work is a combination of Covered
        Software with a work governed by one or more Secondary Licenses, and the
        Covered Software is not Incompatible With Secondary Licenses, this
        License permits You to additionally distribute such Covered Software
        under the terms of such Secondary License(s), so that the recipient of
        the Larger Work may, at their option, further distribute the Covered
        Software under the terms of either this License or such Secondary
        License(s).
        
        3.4. Notices
        
        You may not remove or alter the substance of any license notices
        (including copyright notices, patent notices, disclaimers of warranty,
        or limitations of liability) contained within the Source Code Form of
        the Covered Software, except that You may alter any license notices to
        the extent required to remedy known factual inaccuracies.
        
        3.5. Application of Additional Terms
        
        You may choose to offer, and to charge a fee for, warranty, support,
        indemnity or liability obligations to one or more recipients of Covered
        Software. However, You may do so only on Your own behalf, and not on
        behalf of any Contributor. You must make it absolutely clear that any
        such warranty, support, indemnity, or liability obligation is offered by
        You alone, and You hereby agree to indemnify every Contributor for any
        liability incurred by such Contributor as a result of warranty, support,
        indemnity or liability terms You offer. You may include additional
        disclaimers of warranty and limitations of liability specific to any
        jurisdiction.
        
        4. Inability to Comply Due to Statute or Regulation
        ---------------------------------------------------
        
        If it is impossible for You to comply with any of the terms of this
        License with respect to some or all of the Covered Software due to
        statute, judicial order, or regulation then You must: (a) comply with
        the terms of this License to the maximum extent possible; and (b)
        describe the limitations and the code they affect. Such description must
        be placed in a text file included with all distributions of the Covered
        Software under this License. Except to the extent prohibited by statute
        or regulation, such description must be sufficiently detailed for a
        recipient of ordinary skill to be able to understand it.
        
        5. Termination
        --------------
        
        5.1. The rights granted under this License will terminate automatically
        if You fail to comply with any of its terms. However, if You become
        compliant, then the rights granted under this License from a particular
        Contributor are reinstated (a) provisionally, unless and until such
        Contributor explicitly and finally terminates Your grants, and (b) on an
        ongoing basis, if such Contributor fails to notify You of the
        non-compliance by some reasonable means prior to 60 days after You have
        come back into compliance. Moreover, Your grants from a particular
        Contributor are reinstated on an ongoing basis if such Contributor
        notifies You of the non-compliance by some reasonable means, this is the
        first time You have received notice of non-compliance with this License
        from such Contributor, and You become compliant prior to 30 days after
        Your receipt of the notice.
        
        5.2. If You initiate litigation against any entity by asserting a patent
        infringement claim (excluding declaratory judgment actions,
        counter-claims, and cross-claims) alleging that a Contributor Version
        directly or indirectly infringes any patent, then the rights granted to
        You by any and all Contributors for the Covered Software under Section
        2.1 of this License shall terminate.
        
        5.3. In the event of termination under Sections 5.1 or 5.2 above, all
        end user license agreements (excluding distributors and resellers) which
        have been validly granted by You or Your distributors under this License
        prior to termination shall survive termination.
        
        ************************************************************************
        *                                                                      *
        *  6. Disclaimer of Warranty                                           *
        *  -------------------------                                           *
        *                                                                      *
        *  Covered Software is provided under this License on an "as is"       *
        *  basis, without warranty of any kind, either expressed, implied, or  *
        *  statutory, including, without limitation, warranties that the       *
        *  Covered Software is free of defects, merchantable, fit for a        *
        *  particular purpose or non-infringing. The entire risk as to the     *
        *  quality and performance of the Covered Software is with You.        *
        *  Should any Covered Software prove defective in any respect, You     *
        *  (not any Contributor) assume the cost of any necessary servicing,   *
        *  repair, or correction. This disclaimer of warranty constitutes an   *
        *  essential part of this License. No use of any Covered Software is   *
        *  authorized under this License except under this disclaimer.         *
        *                                                                      *
        ************************************************************************
        
        ************************************************************************
        *                                                                      *
        *  7. Limitation of Liability                                          *
        *  --------------------------                                          *
        *                                                                      *
        *  Under no circumstances and under no legal theory, whether tort      *
        *  (including negligence), contract, or otherwise, shall any           *
        *  Contributor, or anyone who distributes Covered Software as          *
        *  permitted above, be liable to You for any direct, indirect,         *
        *  special, incidental, or consequential damages of any character      *
        *  including, without limitation, damages for lost profits, loss of    *
        *  goodwill, work stoppage, computer failure or malfunction, or any    *
        *  and all other commercial damages or losses, even if such party      *
        *  shall have been informed of the possibility of such damages. This   *
        *  limitation of liability shall not apply to liability for death or   *
        *  personal injury resulting from such party's negligence to the       *
        *  extent applicable law prohibits such limitation. Some               *
        *  jurisdictions do not allow the exclusion or limitation of           *
        *  incidental or consequential damages, so this exclusion and          *
        *  limitation may not apply to You.                                    *
        *                                                                      *
        ************************************************************************
        
        8. Litigation
        -------------
        
        Any litigation relating to this License may be brought only in the
        courts of a jurisdiction where the defendant maintains its principal
        place of business and such litigation shall be governed by laws of that
        jurisdiction, without reference to its conflict-of-law provisions.
        Nothing in this Section shall prevent a party's ability to bring
        cross-claims or counter-claims.
        
        9. Miscellaneous
        ----------------
        
        This License represents the complete agreement concerning the subject
        matter hereof. If any provision of this License is held to be
        unenforceable, such provision shall be reformed only to the extent
        necessary to make it enforceable. Any law or regulation which provides
        that the language of a contract shall be construed against the drafter
        shall not be used to construe this License against a Contributor.
        
        10. Versions of the License
        ---------------------------
        
        10.1. New Versions
        
        Mozilla Foundation is the license steward. Except as provided in Section
        10.3, no one other than the license steward has the right to modify or
        publish new versions of this License. Each version will be given a
        distinguishing version number.
        
        10.2. Effect of New Versions
        
        You may distribute the Covered Software under the terms of the version
        of the License under which You originally received the Covered Software,
        or under the terms of any subsequent version published by the license
        steward.
        
        10.3. Modified Versions
        
        If you create software not governed by this License, and you want to
        create a new license for such software, you may create and use a
        modified version of this License if you rename the license and remove
        any references to the name of the license steward (except to note that
        such modified license differs from this License).
        
        10.4. Distributing Source Code Form that is Incompatible With Secondary
        Licenses
        
        If You choose to distribute Source Code Form that is Incompatible With
        Secondary Licenses under the terms of this version of the License, the
        notice described in Exhibit B of this License must be attached.
        
        Exhibit A - Source Code Form License Notice
        -------------------------------------------
        
          This Source Code Form is subject to the terms of the Mozilla Public
          License, v. 2.0. If a copy of the MPL was not distributed with this
          file, You can obtain one at http://mozilla.org/MPL/2.0/.
        
        If it is not possible or desirable to put the notice in a particular
        file, then You may include the notice in a location (such as a LICENSE
        file in a relevant directory) where a recipient would be likely to look
        for such a notice.
        
        You may add additional accurate notices of copyright ownership.
        
        Exhibit B - "Incompatible With Secondary Licenses" Notice
        ---------------------------------------------------------
        
          This Source Code Form is "Incompatible With Secondary Licenses", as
          defined by the Mozilla Public License, v. 2.0.
Project-URL: Homepage, https://github.com/MWATelescope/log_processing/
Keywords: regex,log,parsing
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Provides-Extra: dev
Provides-Extra: postgres
License-File: LICENSE

[![codecov](https://codecov.io/gh/MWATelescope/log_processing/branch/package/graph/badge.svg?token=TNJ489MDDR)](https://codecov.io/gh/MWATelescope/log_processing)

# Regex Log Parser
Regex Log Parser is a simple and easy to use Python library for log parsing/processing. It allows the user to define a dictionary of regex rules and handler functions which determine how logs should be processed. See the examples below for more information.

This was originally developed at [MWA Telescope](https://mwatelescope.org) in order to perform data mining of a large amount of log data in order to gain useful insights from it. Following the success of the project we have open sourced and published it in the hope that it may be useful for somebody else.

We built this project to extract data from log files and load it into a PostgreSQL database so that it may be queried. However, it has been developed for extensibility and may be used to ingest into other data stores such as MySQL, SQLite, MongoDB, and more. 

Only Postgres is supported currently, if you would like to see more data stores supported, see the contributing section below.

## Basic Idea

Imagine that you have a directory containing a number of log files. These files may be generated by different systems (i.e. a web server) and different versions of those systems.
```bash
/logs
  web1_1.log
  web1_2.log
  web2_1.log
  web2_2.log
```

The log files contain historical information about activity on the system like so:
```bash
[2021-11-25 04:29:55,015] INFO, 192.168.0.1 "POST /login HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36"
[2021-11-25 04:29:56,542] INFO, 192.168.0.1 "GET /logout HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36"
[2021-11-25 04:30:05,731] INFO, 192.168.0.1 "GET / HTTP/1.1" 200 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36"
```

You want to parse these files to answer questions about the number of logins that have taken place or some other kind of event.

Rather than writing code to read each file line by line and doing string splitting, Regex Log Parser allows you to define a set of rules that will be used to parse files within a directory, as well as rules to process lines within matched files.

```python
rules = {
    'web1_*': {
        '[(.*)] INFO, (.*) .*/login.*': 'my_handler',
        '.*': 'skip'
    }
}
```

In the example rules above, we are defining a dictionary where the key is some regex that will be matched against path/filenames within your directory, and the value is a dictionary which defines how to process the file.

By using regex capture groups, we can pull out the information we want from each line (datetime and IP address in the example above) and pass them to some handler function (here called my_handler), which can store the information in a database or do something else with it.

### Important Note
This library FORCES you to handle all lines in a file. i.e. there must be at least one rule to match a line within a file. If not, an exception will be raised.

This was done deliberately to ensure that users are handling all cases. Once you're confident that you are handling the lines that you care about, add a catch-all rule to skip everything else:

```python
rules = {
    'web1_*': {
        '.*': 'skip'
    }
}
```

## Installation
### Prerequisites
- Python >= 3.10

Install the package
```bash
pip install regex_log_parser
```

If you wish to use the included functionality for uploading data into a postgres database, install the extra dependencies like so:
```bash
pip install regex_log_parser[postgres]
```

## Usage
Create a file and import the LogProcessor class. Create an instance of this object then call the run method, passing in a directory containing some logs that you would like to process.

Two things are required to setup the processor. A rules dictionary and a handler object.

```python
from regex_log_parser import LogProcessor, HandlerBase

log_processor = LogProcessor(
    rules=rules,
    handler=handler,
    dry_run=False
)

log_processor.run('/path/to/my/logs')
```

### Rules
Rules is a standard python dictionary of the format:

```python
rules = {
    "file_regex": {
        "line_regex": "handler_function",
    }
}
```

Where:
- `file_regex` is some regex to match the name of a file,
- `line_regex` is some regex to match a line within the file,
- `handler_function` is the name of a function in your handler object which will be used to process the line.

### Handlers
The handler object should be subclassed from the HandlerBase class in `handlers.py`. Or, if you wish to parse your logs and upload into a Postgresql database, you can subclass from the `PostgresHandler` class.

The handler class can implement `startup` and `shutdown` methods. Which will be ran at the start and end of the processing run respectively. These can be used to perform some database setup or cleanup.

handler_functions will have the signature:

```python
def handler(self, file_path, line, match):
```

Where:
- `file_path` is the path to the file of the current line
- `line` is the line in the log file to be handled
- `match` is the regex match group

When using the `PostgresHandler`, you can call `self.queue_op(sql, params)` in your handler functions to queue a database operation. By default this will run SQL operations in batches of 1000, you can customise this by passing the `BATCH_SIZE` parameter in the constructor to `PostgresHandler`. If you want to run a database operation immediately, call `self.queue_op(sql, params, run_now=True)`.

### Full example
```python
from Processor import LogProcessor
from handler import PostgresHandler

class MyHandler(PostgresHandler):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs):

    def startup(self):
        """Optionally run some setup"""
        pass

    def shutdown(self):
        """Optionally run some cleanup"""
        pass

    def my_handler(self, file_path, line, match):
        field_1 = match.group(1)
        field_2 = match.group(2)

        sql = """
            INSERT INTO my_table (field_1, field_2)
            VALUES
                (%s, %s);
        """
        params = (field_1, field_2)

        self.queue_op(sql, params, run_now=False)

rules = {
    'example\.log': {
        '(.*),(.*)': 'my_handler',
        '.*': 'skip'
    }
}

handler = MyHandler(
    dsn='user:pass@localhost:5432/test',
    setup_script='path/to/db_setup'
)

log_processor = LogProcessor(
    rules=rules,
    handler=handler,
    dry_run=False
)

log_processor.run('/path/to/my/logs')
```

## The Handler class
The library only stipulates that the handler object passed to the `LogProcessor` object is an instance of `HandlerBase`.

You should subclass from `HandlerBase` and add your own methods to handle the lines found by your rules.

Override the `startup` and `shutdown` methods in your handler class to run a function at the start and end of parsing, respectively.

### The PostgresHandler class
Or, if you wish to make use of the included `PostgresHandler` for uploading data into a PostgreSQL database, subclass from that instead.

The PostgresHandler object has the following constructor:
```python
class PostgresHandler(HandlerBase):
    def __init__(self, dsn: Optional[str] = None, connection: Optional[Connection] = None, setup_script: Optional[str] = None, BATCH_SIZE: int = 1000):
```

- `dsn` optionally provide a dsn string which will be used to connect to an existing PostgreSQL database or;
- `connection` optionally provide an existing psycopg3 connection. Useful for unit tests.
- `setup_script` optionally provide the path to a SQL file in order to perform some database setup/cleanup in between runs.
- `BATCH_SIZE` Execute database operations in batches of BATCH_SIZE, defaults to 1000.

In your handler functions, define a SQL string and args tuple, and pass them to the `queue_op` function. These should be setup according to the psycopg3 format, see the example above. If you wish to execute a database operation immediately, pass `run_now=True` to `queue_op`, otherwise, it will be added to a queue, and executed in sequence when the size of the queue reaches `BATCH_SIZE`.

## Contributing
As mentioned above, the only data store that is currently supported is Postgres. If you would like to add support for another data store such as MySQL or MongoDB, then please open a pull request.
