This blog post introduces kinesis-logs-reader, an open-source Python library and command line tool for working with large volumes of Amazon VPC Flow Logs (and other Cloud Watch Logs data) using Kinesis. We gave a live demo of the new tool to the group of Seattle AWS Architects & Engineers at the SURF Incubator last week:
Check it out at our Github repository. Suggestions and pull requests are welcome!
Why a New Tool?
A while back Observable published flowlogs-reader, which is superficially similar. It provides a library and a convenient means for doing ad-hoc analysis of flow log data. For example, you can use it to retrieve flows for a given time window in one command:
>$ flowlogs_reader \ --start-time="2016-05-21 02:00:00" \ --end-time="2016-05-21 02:10:00"\ "flowlog_group" 2 12345678901 eni-fedbca01 192.0.2.71 198.51.100.1 123 123 17 1 76 1463796042 1463796092 ACCEPT OK 2 12345678901 eni-fedbca01 198.51.100.1 192.0.2.71 123 123 17 1 76 1463796042 1463796092 ACCEPT OK 2 12345678901 eni-fedbca01 203.0.113.84 192.0.2.71 33856 9100 6 2 120 1463796042 1463796092 REJECT OK 2 12345678901 eni-edbca012 - - - - - - - 1463796064 1463796638 - NODATA
flowlogs-reader uses the Cloudwatch Logs API to pull data. That's suitable for many applications, but CloudWatch Logs isn't designed for pulling large volumes of data. For that you'll need to use Amazon Kinesis, which can send out data at a much higher bitrate.
Moving an application or analysis task to Kinesis has a cost, though. You'll need to consider whether the Kinesis stream has multiple "shards," keep track of an iterator for each shard, and unwrap the Base64-encoded, gzip-compressed, and JSON-serialized data to get at the plain text of the logs:
kinesis-logs-reader takes all that into account. It wraps the Kinesis API to abstract away its complexity, and restores your ability to get data with a single command:
>$ kinesis_logs_reader --start-time="2016-05-21 02:00:00" "flowlog_stream" account_id action bytes dstaddr dstport end interface_id log_status packets protocol srcaddr srcport start version 12345678901 ACCEPT 76 203.0.113.84 123 1463796032 eni-7e1e6334 OK 1 17 192.0.2.71 123 1463795982 2 12345678901 ACCEPT 76 192.0.2.71 123 1463796032 eni-7e1e6334 OK 1 17 203.0.113.84 123 1463795982 2 12345678901 ACCEPT 228 22.214.171.124 123 1463796016 eni-25bed87f OK 3 17 192.0.2.72 123 1463795857 2 12345678901 ACCEPT 312 192.0.2.72 0 1463796016 eni-25bed87f OK 3 1 198.51.100.1 0 1463795857 2
The library is similarly easy to use :
import kinesis_logs_reader # Read the latest entries reader = kinesis_logs_reader.KinesisLogsReader('flowlog_stream') for item in reader: process_function(item)
kinesis-logs-reader uses the newly-available time-based indexing for Kinesis streams so you don't have to page through all of the logs in your stream to get at ones in a particular window.
Where to Go from Here
To make use of kinesis-logs-reader you'll need to have a Kinesis stream with a subscription to a CloudWatch Logs group. The best way to set that up is to follow Amazon's guide.
For continual real-time processing of data you should consider running an AWS Lambda function instead of an application using kinesis-logs-reader. A previous blog post described that method.
To check out Observable's other open source projects see our GitHub page.
Also: For future-proof network security monitoring that uses flow log data as a primary input look at the Observable Networks service.
Getting better visibility into your network and improving your security couldn’t be easier. Sign up for a free, no-risk trial of Observable’s Endpoint Modeling solution, and change the way you see security.
Detect Threats Faster – Start Your Free, No-Risk Trial