Introducing kinesis-logs-reader

Learn more about kinesis-logs-reader, which was introduced last week at Seattle AWS User Group.

This blog post introduces kinesis-logs-reader, an open-source Python library and command line tool for working with large volumes of Amazon VPC Flow Logs (and other Cloud Watch Logs data) using Kinesis. We gave a live demo of the new tool to the group of Seattle AWS Architects & Engineers at the SURF Incubator last week:

Check it out at our Github repository. Suggestions and pull requests are welcome!

Why a New Tool?

A while back Observable published flowlogs-reader, which is superficially similar. It provides a library and a convenient means for doing ad-hoc analysis of flow log data. For example, you can use it to retrieve flows for a given time window in one command:

>$ flowlogs_reader \
    --start-time="2016-05-21 02:00:00" \
    --end-time="2016-05-21 02:10:00"\
2 12345678901 eni-fedbca01 123 123 17 1 76 1463796042 1463796092 ACCEPT OK
2 12345678901 eni-fedbca01 123 123 17 1 76 1463796042 1463796092 ACCEPT OK
2 12345678901 eni-fedbca01 33856 9100 6 2 120 1463796042 1463796092 REJECT OK
2 12345678901 eni-edbca012 - - - - - - - 1463796064 1463796638 - NODATA

flowlogs-reader uses the Cloudwatch Logs API to pull data. That's suitable for many applications, but CloudWatch Logs isn't designed for pulling large volumes of data. For that you'll need to use Amazon Kinesis, which can send out data at a much higher bitrate.

Moving an application or analysis task to Kinesis has a cost, though. You'll need to consider whether the Kinesis stream has multiple "shards," keep track of an iterator for each shard, and unwrap the Base64-encoded, gzip-compressed, and JSON-serialized data to get at the plain text of the logs:

Download A New Way to Look at AWS Security whitepaper.

Download White Paper

kinesis-logs-reader takes all that into account. It wraps the Kinesis API to abstract away its complexity, and restores your ability to get data with a single command:

>$ kinesis_logs_reader --start-time="2016-05-21 02:00:00" "flowlog_stream"
account_id    action    bytes    dstaddr    dstport    end    interface_id    log_status    packets    protocol    srcaddr    srcport    start    version
12345678901    ACCEPT    76    123    1463796032    eni-7e1e6334    OK    1    17    123    1463795982    2
12345678901    ACCEPT    76    123    1463796032    eni-7e1e6334    OK    1    17    123    1463795982    2
12345678901    ACCEPT    228    123    1463796016    eni-25bed87f    OK    3    17    123    1463795857    2
12345678901    ACCEPT    312    0    1463796016    eni-25bed87f    OK    3    1    0    1463795857    2

The library is similarly easy to use :

import kinesis_logs_reader
# Read the latest entries
reader = kinesis_logs_reader.KinesisLogsReader('flowlog_stream')
for item in reader:

kinesis-logs-reader uses the newly-available time-based indexing for Kinesis streams so you don't have to page through all of the logs in your stream to get at ones in a particular window.

Where to Go from Here

To make use of kinesis-logs-reader you'll need to have a Kinesis stream with a subscription to a CloudWatch Logs group. The best way to set that up is to follow Amazon's guide.

For continual real-time processing of data you should consider running an AWS Lambda function instead of an application using kinesis-logs-reader. A previous blog post described that method.

To check out Observable's other open source projects see our GitHub page.

Also: For future-proof network security monitoring that uses flow log data as a primary input look at the Observable Networks service.

Experience Dynamic Endpoint Modeling on Your Own Network

Getting better visibility into your network and improving your security couldn’t be easier. Sign up for a free, no-risk trial of Observable’s Endpoint Modeling solution, and change the way you see security.

Detect Threats Faster – Start Your Free, No-Risk Trial