How to create a cybersecurity analytics platform with AWS analytics and machine learning

Cybersecurity threats target every level of the global public sector and continue to evolve faster than organizations can respond. As of 2020, the average cost of a data breach and average time to identify a breach are $3.86 million USD and 207 days respectively, according to a study by IBM Security.

In the 2021 Executive Order on Improving the Nation’s Cybersecurity, the Biden administration calls for the centralizing and streamlining of access to cybersecurity data to “drive analytics for identifying and managing cybersecurity risks.” Cybersecurity analytics is a systematic methodology designed to collect, ingest, process, aggregate, and analyze security events. This methodology empowers organizations to proactively perform security investigations, powered by advanced analytics and machine learning (ML), that help mitigate cyber issues more effectively and efficiently at scale.

Traditional security analytics can struggle to capture, store, and analyze all the data generated by today’s modern and digital systems and may cause a risk of incomplete analysis. A modern threat intelligence platform includes cybersecurity analytics that harness the power of ML and behavioral analytics to screen systems and networks, identify and highlight anomalies, and empower organizations to address risks efficiently and effectively. Cybersecurity analytics can help public sector organizations understand the tremendous volume of data flowing in and out of networks and quickly detect and respond to potential threats. By providing real-time intelligence, historical records of past threats, and predictive recommendations, cybersecurity analytics can protect an organization from a potentially costly data breach or event.

In this blog post, learn about the core components of a cybersecurity analytics framework and how organizations can use Amazon Web Services (AWS) to design a cybersecurity analytics platform with analytics and ML services.

The core components of a cybersecurity analytics platform

As public sector organizations aim to develop a zero trust architecture, they can build a security platform that unifies data, analytics, and ML. A robust cybersecurity analytics platform includes the following capabilities:

Behavioral analytics: Abnormal behavior of end-users or applications often indicates a security breach or issue. Behavioral analytics study patterns of user, application, and device behavior to identify anomalies. For example, an end-user who logs in from an external country to access systems not required for work, or an application that begins sending unusual queries and commands, could indicate a breach.

Security investigation: Data analytics, combined with data gathered from network flows, sensors, cloud systems, and security events, provide the data that cyber defense teams need to deploy an intrusion detection system (IDS), which can enable organizations to detect and respond to intrusions in real time.

Predictive security analytics: Data predictive analytics gathers historic data from previous cyber events, from which event patterns and vulnerabilities can be identified. Event patterns can be used to develop a tailored cyber threat defense response. Security analytics teams can use a combination of analytic techniques such as data ingestion, data processing, artificial learning, ML, natural language processing (NLP), and statistics to detect variations from the norm to predict imminent issues.

Automated monitoring: Automated monitoring and threat detection alerts play a crucial role in modern cyber defense operations. Automated cyber defense facilitates continuous monitoring of the environment, providing real-time actionable insights, cybersecurity threat intelligence feeds, and alerts for incoming threats.

The five phases of building an effective cybersecurity analytics platform

As a component of a zero trust architecture, public sector organizations may consider incorporating a unified data and analytics platform that harnesses the power of advanced analytics and ML while being mindful of compliance with regulations and auditing requirements. There are five phases in an effective unified cybersecurity analytics platform: 1) data collection, 2) data storage, 3) data processing, 4) data analysis, and 5) visualization and dissemination. Throughout these phases, automation, data governance, and security solutions can be designed to meet auditing and compliance requirements.

Additionally, public sector organizations may either establish a security perimeter using AWS native security services, or use existing third-party cybersecurity applications that operate either on-premises or in other cloud vendor services.

1. Data collection

To generate full visibility into their data to effectively protect it, organizations need robust and comprehensive datasets that are not always located in the same location and are often siloed across multiple systems and applications. Having the ability to collect the needed data from wherever it resides, be it on premises, in the cloud, or in a hybrid environment, is the crucial first step in building a cybersecurity analytics platform.

With AWS, AWS Transfer Family and Amazon Kinesis can provide the needed capabilities to accomplish this first phase.

For file based sources, organizations can use the AWS Transfer Family service, which enables a simple move of file transfer workloads that use the secure shell file transfer protocol (SFTP) to AWS without needing to modify applications or manage any SFTP servers. Amazon Kinesis makes it simple to collect, process, and analyze real-time, streaming data so organizations obtain timely insights and can react quickly to new information. Amazon Kinesis ingests real-time data such as video, audio, application logs, website clickstreams, and internet of things (IoT) telemetry data for ML, analytics, and other applications.

2. Data storage

Once the required data has been collected, the storage of the data is key. This continues the data pipeline and also allows for historical data to help train ML models. When organizations continuously provide more data to train their ML models, these models increase prediction accuracy, which helps combat the latest tactics, techniques, and procedures of malicious threats.

Organizations can use Amazon Simple Storage Service (Amazon S3) for storing this logging data. Amazon S3 is a highly scalable, highly resilient object storage service. AWS provides a rich set of multi-layered capabilities to secure log data that is stored in Amazon S3, including encrypting objects, preventing deletion, and using lifecycle policies to transition data to lower-cost storage over time. Access to data in Amazon S3 can also be restricted through AWS Identity and Access Management (IAM) policies, AWS Organizations service control policies (SCPs), Amazon S3 bucket policies, Amazon S3 Access Points, and AWS PrivateLink interfaces.

Additionally, security data lakes can be enriched by adding governance with AWS Lake Formation, which can create and store metadata in a data catalog; configure and control table, row, and, column level access permissions; as well as audit data access and permissions.

3. Data processing

A key component of a zero trust architecture is incorporating automation and orchestration to reduce the time and manual effort needed to process the immense amounts of security and logging data that is ingested. By reducing the time and effort needed to conduct data processing, security teams have more time to devote towards analysis, investigation, and response.

To accelerate time to value in this phase, security teams can use a service designed to make data integration quicker and simpler, such as AWS Glue. AWS Glue is a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, ML, and application development. It also provides data integration capabilities so security teams can analyze data in minutes instead of months using both visual and code-based interfaces to make data integration simpler.

To handle the tremendous amount of security data that a cybersecurity analytics platform collects, organizations need a big data processing capability that can rapidly scale to meet the needs of today and tomorrow. Amazon EMR makes it simple to process petabyte-scale data using the latest open source big data frameworks such as Spark, Hive, Presto, HBase, Flink, and Hudi. With Amazon EMR, organizations can create clusters and provision one, hundreds, or thousands of compute instances to process data at any scale with ease.

Security teams need both historical and current information to conduct investigations and determine an appropriate course of action. Using a service such as Amazon Kinesis Data Analytics, security teams can quickly stream and transform data in real time with Apache Flink. Kinesis Data Analytics reduces the complexity of building, managing, and integrating Apache Flink applications with other AWS services.

4. Data analysis and prediction

Once security teams establish the beginnings of a cybersecurity data pipeline, organizations should continue leveraging automation and other capabilities that allow for a reduction in workload and rapid time to value. One area where organizations can reduce workload and time is in code development. For this area, security teams can use a service like AWS Lambda, which runs code without provisioning or managing servers. Lambda runs for virtually any type of application or backend service. Code written in AWS Lambda can be set up to automatically trigger from other AWS services, or organizations can call it directly from any web or mobile app.

To generate real-time insights and predictive analytics across operational databases, data lakes, warehouses, and third-party datasets, organizations need a secure and scalable data warehouse. For organizations looking for a fully managed data warehouse solution, Amazon Redshift can provide insights with fast and secure analytics at scale. While conducting monitoring and investigation activities, security teams can conduct deep dive analyses on security and application log data. Amazon OpenSearch Service performs interactive log analytics, real-time application monitoring, website search, and visualization capabilities powered by OpenSearch Dashboards. Amazon Athena is an interactive and serverless query service that uses standard SQL to analyze data in Amazon S3. Athena uses Presto with full standard SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Apache Parquet, and Avro. Athena is ideal for quick, ad-hoc querying and also integrates with Amazon QuickSight for simple visualization—but Athena can also handle complex analyses, to include large joins, window functions, and arrays.

Once organizations establish a cybersecurity analytics platform, they can augment it by adding prediction abilities with ML. However, to be successful and fully harness the power of ML, these insights need to be available and accessible to users of any level of skill while providing the breadth of capabilities needed. For this, organizations can use services like Amazon SageMaker to innovate with ML through a choice of tools: integrated development environments for data scientists and no-code visual interfaces for business analysts. Data analysts and database developers who have SQL knowledge can take advantage of Amazon Redshift ML, which makes it simple to create, train, and apply ML models using the power of Amazon SageMaker in the background.

5. Visualization and dissemination

As the final step in building a cybersecurity analytics platform, organizations can leverage data visualization to develop simple to understand dashboards that can drive fast and effective action. These dashboards can be shared within an organization or shared with external stakeholders to conduct joint investigations and response operations.

To accelerate time to action, visualizations should be simple and quick to create while continuing to integrate analytics and ML. Analytics and visualization tools such as Amazon QuickSight fill this need by empowering employees within an organization to build visualizations, perform ad-hoc analyses, and quickly get insights from their data at any time, on any device. With simple to understand analytic products, public sector organizations can quickly coordinate and act to respond to and prevent cybersecurity threats.

Learn more about cybersecurity analytics on AWS

A robust cybersecurity analytics platform can help public sector organizations detect issues, decide a course of action, and deter threats before they become a problem. Unifying data, analytics, and artificial intelligence (AI) and ML can empower organizations to proactively perform security investigations and anticipate and mitigate issues more effectively and efficiently at scale across the cyber threat landscape.

Do you have more questions about how you can build a cybersecurity analytics platform on AWS? Contact your AWS account manager, or send an inquiry to the AWS Public Sector Sales Team to learn more.

Learn more about AWS for cybersecurity in the public sector:

Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.

Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.