Abstract
The state-of-the-art evaluation of an Intrusion Detection System (IDS) relies on benchmark datasets composed of the regular system's and potential attackers' behavior. The datasets are collected once and independently of the IDS under analysis. This paper questions this practice by introducing a methodology to elicit particularly challenging samples to benchmark a given IDS. In detail, we propose (1) six fitness functions quantifying the suitability of individual samples, particularly tailored for safety-critical cyber-physical systems, (2) a scenario-based methodology for attacks on networks to systematically deduce optimal samples in addition to previous datasets, and (3) a respective extension of the standard IDS evaluation methodology. We applied our methodology to two network-based IDSs defending an advanced driver assistance system. Our results indicate that different IDSs show strongly differing characteristics in their edge case classifications and that the original datasets used for evaluation do not include such challenging behavior. In the worst case, this causes a critical undetected attack, as we document for one IDS. Our findings highlight the need to tailor benchmark datasets to the individual IDS in a final evaluation step. Especially the manual investigation of selected samples from edge case classifications by domain experts is vital for assessing the IDSs.
Original language | English |
---|---|
Pages (from-to) | 1-15 |
Number of pages | 15 |
Journal | IEEE Transactions on Dependable and Secure Computing |
DOIs | |
State | Accepted/In press - 2023 |
Keywords
- Advanced Driver Assistance System
- Behavioral sciences
- Benchmark Dataset
- Benchmark testing
- Controller Area Network
- Hardware in the Loop Simulation
- Intrusion Detection Evaluation Problem
- Intrusion detection
- Manuals
- Measurement
- Methodology
- Scenario-Based Optimization
- Security
- Standards
- openpilot