FlowLogParser is a Java application designed to process and analyze network flow logs using concurrent processing. It parses flow log files, categorizes traffic based on ports and protocols, and generates statistical reports.
- Default Protocol Support:
- Limited to 8 predefined protocols in default PROTOCOL_MAP:
- Any protocol, port combination not in the map is labeled as "Untagged"
- Custom protocol mappings can be provided via optional protocol_map_file
-
Flow Log Version:
- Assumes VPC Flow Logs version 2 format
- No header row in flow log file
- Each line must contain minimum 8 fields
- Lines with fewer than 8 fields are skipped
- Fields are space-separated
-
Field Requirements:
- protocol (field 8): Must be a numeric protocol identifier
- Invalid or malformed fields result in line being skipped
- No validation of other fields as they're not used in analysis
- Each valid line increments both tag and port-protocol counters
- Duplicate lines are counted separately
- "Untagged" is used when no matching tag is found in lookup table
# Clone the repository
git clone https://github.com/nachivrn/FlowLogParser.git
cd FlowLogParser
javac src/FlowLogParser.java -d .
javac -cp .:lib/junit-platform-console-standalone-1.8.2.jar src/FlowLogParserTest.java -d .
# Run the code
# Usage: java FlowLogParser <flow_log_file> <lookup_table_file> <output_file> [protocol_map_file]
java FlowLogParser ./data/flowlogfile.txt ./data/lookuptable.csv ./data/output.txt
# Check output
cat ./data/output.txt
# Run the Unit & Functional tests
java -jar lib/junit-platform-console-standalone-1.8.2.jar --class-path . --select-class FlowLogParserTest
-
Basic Functionality Tests
- Validates lookup table loading
- Tests single line processing
- Validates concurrent processing
-
Edge Cases
- Tests duplicate entries
- Test malformed input
- Test empty files
-
Error Handling Tests
- Invalid protocol numbers
- Missing fields
- Malformed input lines
- Tested with 100,000+ flow log records approximately 10MB and lookup table with 10,000 entries
- Measured processing time for large datasets
- Confirmed proper thread utilization