skip to main content
research-article
Open access

CAShift: Benchmarking Log-Based Cloud Attack Detection under Normality Shift

Published: 19 June 2025 Publication History

Abstract

With the rapid advancement of cloud-native computing, securing cloud environments has become an important task. Log-based Anomaly Detection (LAD) is the most representative technique used in different systems for attack detection and safety guarantee, where multiple LAD methods and relevant datasets have been proposed. However, even though some of these datasets are specifically prepared for cloud systems, they only cover limited cloud behaviors and lack information from a whole-system perspective. Another critical issue to consider is normality shift, which implies that the test distribution could differ from the training distribution and highly affect the performance of LAD. Unfortunately, existing works only focus on simple shift types such as chronological changes, while other cloud-specific shift types are ignored, e.g., different deployed cloud architectures. Therefore, a dataset that captures diverse cloud system behaviors and various types of normality shifts is essential.
To fill this gap, we construct a dataset CAShift to evaluate the performance of LAD in cloud, which considers different roles of software in cloud systems, supports three real-world normality shift types (application shift, version shift, and cloud architecture shift), and features 20 different attack scenarios in various cloud system components. Based on CAShift, we conduct a comprehensive empirical study to investigate the effectiveness of existing LAD methods in normality shift scenarios. Additionally, to explore the feasibility of shift adaptation, we further investigate three continuous learning approaches, which are the most common methods to mitigate the impact of distribution shift. Results demonstrated that 1) all LAD methods suffer from normality shift where the performance drops up to 34%, and 2) existing continuous learning methods are promising to address shift drawbacks, but the ratio of data used for model retraining and the selection of algorithms highly affect the shift adaptation, with an increase in the F1-Score of up to 27%. Based on our findings, we offer valuable implications for future research in designing more robust LAD models and methods for LAD shift adaptation.

Formats available

You can view the full content in the following formats:

References

[1]
Asbatel. 2024. CHIDS. https://github.com/Asbatel/ContainerHIDS
[2]
CAShift-Bench. 2024. Website for CAShift and Source Code. https://sites.google.com/view/cashift-bench
[3]
Jialuo Chen, Jingyi Wang, Xingjun Ma, Youcheng Sun, Jun Sun, Peixin Zhang, and Peng Cheng. 2023. QuoTe: Quality-oriented testing for deep learning systems. ACM Transactions on Software Engineering and Methodology, 32, 5 (2023), 1–33.
[4]
Zhuangbin Chen, Jinyang Liu, Wenwei Gu, Yuxin Su, and Michael R Lyu. 2021. Experience report: Deep learning-based system log analysis for anomaly detection. arXiv, arxiv:2107.05908.
[5]
Kyunghyun Cho. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
[6]
Cloudflare. 2025. Risk grows as multi-vector attacks become the norm. https://www.cloudflare.com/the-net/multi-vector-threats/
[7]
Containerd. 2024. Containerd. https://github.com/containerd/containerd"
[8]
The MITRE Corporation. 2024. CWE-200. https://cwe.mitre.org/data/definitions/200.html
[9]
Xueqi Dang, Yinghua Li, Mike Papadakis, Jacques Klein, Tegawendé F Bissyandé, and Yves Le Traon. 2023. GraphPrior: Mutation-based test input prioritization for graph neural networks. ACM Transactions on Software Engineering and Methodology, 33, 1 (2023), 1–40.
[10]
National Vulnerability Database. 2019. CVE-2019-5736. https://nvd.nist.gov/vuln/detail/CVE-2019-5736
[11]
National Vulnerability Database. 2020. CVE-2020-15257. https://nvd.nist.gov/vuln/detail/CVE-2020-15257
[12]
National Vulnerability Database. 2021. CVE-2021-25742. https://nvd.nist.gov/vuln/detail/CVE-2021-25742
[13]
National Vulnerability Database. 2021. CVE-2021-25743. https://nvd.nist.gov/vuln/detail/CVE-2021-25743
[14]
National Vulnerability Database. 2022. CVE-2022-1708. https://nvd.nist.gov/vuln/detail/CVE-2022-1708
[15]
National Vulnerability Database. 2024. CVE-2024-21626. https://nvd.nist.gov/vuln/detail/CVE-2024-21626
[16]
Datadog. 2024. Container Report. https://www.datadoghq.com/about/latest-news/press-releases/datadogs-2022-container-report-finds-organizations-expanding-container-adoption-with-improved-ability-to-scale-and-manage-complex-environments/
[17]
Jacob Devlin. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[18]
Terrance DeVries and Graham W Taylor. 2018. Learning confidence for out-of-distribution detection in neural networks. arXiv preprint arXiv:1802.04865.
[19]
Docker. 2024. Docker, Inc. https://www.docker.com/
[20]
Marius Dragoi, Elena Burceanu, Emanuela Haller, Andrei Manolache, and Florin Brad. 2022. AnoShift: A distribution shift benchmark for unsupervised anomaly detection. Advances in Neural Information Processing Systems, 35 (2022), 32854–32867.
[21]
Min Du, Zhi Chen, Chang Liu, Rajvardhan Oak, and Dawn Song. 2019. Lifelong anomaly detection through unlearning. In Proceedings of the 2019 ACM SIGSAC conference on computer and communications security. 1283–1297.
[22]
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. 1285–1298.
[23]
Asbat El Khairi, Marco Caselli, Christian Knierim, Andreas Peter, and Andrea Continella. 2022. Contextualizing system calls in containers for anomaly-based intrusion detection. In Proceedings of the 2022 on Cloud Computing Security Workshop. 9–21.
[24]
Asbat El Khairi, Marco Caselli, Andreas Peter, and Andrea Continella. 2024. REPLICAWATCHER: Training-less Anomaly Detection in Containerized Microservices. In Network and Distributed System Security Symposium, NDSS 2023.
[25]
Falcosecurity. 2024. Falco. https://falco.org/
[26]
Amir Farzad and T Aaron Gulliver. 2020. Unsupervised log message anomaly detection. ICT Express, 6, 3 (2020), 229–237.
[27]
Yang Feng, Qingkai Shi, Xinyu Gao, Jun Wan, Chunrong Fang, and Zhenyu Chen. 2020. Deepgini: prioritizing massive tests to enhance the robustness of deep neural networks. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 177–188.
[28]
Josh Gardner, Zoran Popovic, and Ludwig Schmidt. 2024. Benchmarking distribution shift in tabular data with tableshift. Advances in Neural Information Processing Systems, 36 (2024).
[29]
Martin Grimmer, Martin Max Röhling, D Kreusel, and Simon Ganz. 2019. A modern and sophisticated host based intrusion detection data set. IT-Sicherheit als Voraussetzung für eine erfolgreiche Digitalisierung, 11 (2019), 135–145.
[30]
Dongqi Han, Zhiliang Wang, Wenqi Chen, Kai Wang, Rui Yu, Su Wang, Han Zhang, Zhihua Wang, Minghui Jin, and Jiahai Yang. 2023. Anomaly Detection in the Open World: Normality Shift Detection, Explanation, and Adaptation. In NDSS.
[31]
Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261.
[32]
Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. 2018. Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606.
[33]
Qiang Hu, Yuejun Guo, Maxime Cordy, Xiaofei Xie, Lei Ma, Mike Papadakis, and Yves Le Traon. 2022. An empirical study on data distribution-aware test selection for deep learning enhancement. ACM Transactions on Software Engineering and Methodology (TOSEM), 31, 4 (2022), 1–30.
[34]
Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Lei Ma, Mike Papadakis, and Yves Le Traon. 2024. Active Code Learning: Benchmarking Sample-Efficient Training of Code Models. IEEE Transactions on Software Engineering.
[35]
Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Mike Papadakis, Lei Ma, and Yves Le Traon. 2023. Codes: towards code model generalization under distribution shift. In 2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). 1–6.
[36]
Zhihan Jiang, Jinyang Liu, Junjie Huang, Li Yichen, Yintong Huo, Jiazhen Gu, Zhuangbin Chen, Jieming Zhu, and Michael R. Lyu. 2024. A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We? In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis.
[37]
Alexander D. Kent. 2015. Cybersecurity Data Sources for Dynamic Network Research. In Dynamic Networks in Cybersecurity.
[38]
Diederik P Kingma. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
[39]
Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, and Irena Gao. 2021. Wilds: A benchmark of in-the-wild distribution shifts. In International conference on machine learning. 5637–5664.
[40]
Kubernetes. 2024. Kubernetes. https://kubernetes.io/
[41]
Van-Hoang Le and Hongyu Zhang. 2021. Log-based anomaly detection without log parsing. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 492–504.
[42]
Van-Hoang Le and Hongyu Zhang. 2022. Log-based anomaly detection with deep learning: How far are we? In Proceedings of the 44th international conference on software engineering. 1356–1367.
[43]
Bowen Li, Xin Peng, Qilin Xiang, Hanzhang Wang, Tao Xie, Jun Sun, and Xuanzhe Liu. 2022. Enjoy your observability: an industrial survey of microservice tracing and analysis. Empirical Software Engineering, 27 (2022), 1–28.
[44]
Yu Li, Muxi Chen, and Qiang Xu. 2022. HybridRepair: towards annotation-efficient repair for deep learning models. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 227–238.
[45]
Yuhang Lin, Olufogorehan Tunde-Onadele, and Xiaohui Gu. 2020. CDL: Classified Distributed Learning for Detecting Security Attacks in Containerized Applications. In Annual Computer Security Applications Conference (ACSAC ’20). 179–188.
[46]
Chang Liu, Yinpeng Dong, Wenzhao Xiang, Xiao Yang, Hang Su, Jun Zhu, Yuefeng Chen, Yuan He, Hui Xue, and Shibao Zheng. 2024. A comprehensive study on robustness of image classification models: Benchmarking and rethinking. International Journal of Computer Vision, 1–23.
[47]
Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, and Pei Sun. 2019. Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In IJCAI. 7, 4739–4745.
[48]
Opencontainers. 2023. Open Container Initiative Runtime Specification. https://github.com/opencontainers/runtime-spec
[49]
Opencontainers. 2024. Runc. https://github.com/opencontainers/runc"
[50]
Oracle. 2024. What is Kubernetes. https://www.oracle.com/in/cloud/cloud-native/kubernetes-engine/what-is-kubernetes/
[51]
Jonathan Pan, Wong Swee Liang, and Yuan Yidi. 2024. RAGLog: Log Anomaly Detection using Retrieval Augmented Generation. In 2024 IEEE World Forum on Public Safety Technology (WFPST). 169–174.
[52]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
[53]
Selenium. 2024. Selenium WebDriver. https://www.selenium.dev/projects/
[54]
Jungsuk Song, Hiroki Takakura, Yasuo Okabe, Masashi Eto, Daisuke Inoue, and Koji Nakao. 2011. Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation. In Proceedings of the first workshop on building analysis datasets and gathering experience returns for security. 29–36.
[55]
Inc. Sysdig. 2024. Sysdig. https://sysdig.com/
[56]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, 9, 11 (2008).
[57]
Xuejie Wang, Qilei Cao, Qiaozheng Wang, Zhiying Cao, Xiuguo Zhang, and Peipeng Wang. 2022. Robust log anomaly detection based on contrastive learning and multi-scale MASS. The Journal of Supercomputing, 78, 16 (2022), 17491–17512.
[58]
Lin Yang, Junjie Chen, Shutao Gao, Zhihao Gong, Hongyu Zhang, Yue Kang, and Huaan Li. 2024. Try with Simpler-An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection. ACM Transactions on Software Engineering and Methodology, 33, 5 (2024), 1–27.
[59]
Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Semi-supervised log-based anomaly detection via probabilistic label estimation. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1448–1460.
[60]
Jiongchi Yu, Xiaofei Xie, Cen Zhang, Sen Chen, Yuekang Li, and Wenbo Shen. 2024. Bugs in Pods: Understanding Bugs in Container Runtime Systems. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis.
[61]
Houssam Zenati, Manon Romain, Chuan-Sheng Foo, Bruno Lecouat, and Vijay Chandrasekhar. 2018. Adversarially learned anomaly detection. In 2018 IEEE International conference on data mining (ICDM). 727–736.
[62]
Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, and Ze Li. 2019. Robust log-based anomaly detection on unstable log data. In Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 807–817.
[63]
Nengwen Zhao, Honglin Wang, Zeyan Li, Xiao Peng, Gang Wang, Zhu Pan, Yong Wu, Zhen Feng, Xidao Wen, and Wenchi Zhang. 2021. An empirical investigation of practical log anomaly detection for online service systems. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1404–1415.
[64]
Ziming Zhao, Zhaoxuan Li, Zhuoxue Song, Wenhao Li, and Fan Zhang. 2024. Trident: A universal framework for fine-grained and class-incremental unknown traffic detection. In Proceedings of the ACM Web Conference 2024. 1608–1619.
[65]
Ziming Zhao, Zhaoxuan Li, Xiaofei Xie, Jiongchi Yu, Fan Zhang, Rui Zhang, Binbin Chen, Xiangyang Luo, Ming Hu, and Wenrui Ma. 2024. : Towards Fine-Grained Unknown Class Detection Against the Open-Set Attack Spectrum With Variable Legitimate Traffic. IEEE/ACM Transactions on Networking.
[66]
Ziming Zhao, Zhaoxuan Li, Jiongchi Yu, Fan Zhang, Xiaofei Xie, Haitao Xu, and Binbin Chen. 2023. CMD: Co-analyzed IoT malware detection and forensics via network and hardware domains. IEEE Transactions on Mobile Computing, 23, 5 (2023), 5589–5603.
[67]
Jieming Zhu, Shilin He, Pinjia He, Jinyang Liu, and Michael R Lyu. 2023. Loghub: A large collection of system log datasets for ai-driven log analytics. In 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE). 355–366.
[68]
Vinko Zlomislić, Krešimir Fertalj, and Vlado Sruk. 2017. Denial of service attacks, defences and research challenges. Cluster Computing, 20 (2017), 661–671.

Recommendations

Comments

RSS Feed