Unraveling Cybersecurity Mysteries: Exploring SIEM/SOC Scenarios and Threat Detection Use Cases with Insights from SANS, OWASP, White House, MITRE and Notable Incidents

Ertugrul Akbas
22 min readJan 20, 2024
  1. EPS Crosscheck: Check your current EPS values with SANS EPS calculation table

If your EPS values are not compatible with the above table (SANS EPS calculation table), it means that you are missing logs. The issue of missing logs is reported in OWASP Top 10

Inadequate logging & Missing Logs

Inadequate logging due to constraints within SIEM system requirements or resource limitations stands out as a prominent risk identified by OWASP. This limitation significantly impacts incident response capabilities and forensic investigations, leaving organizations vulnerable to potential threats and extensive recovery periods.

Below are the example scenarios from OWASP:

2. Access keys of an administrative API were leaked on a public repository. The repository owner was notified by email about the potential leak, but took more than 48 hours to act upon the incident, and access keys exposure may have allowed access to sensitive data. Due to insufficient logging, the company is not able to assess what data was accessed by malicious actors.

3. A video-sharing platform was hit by a “large-scale” credential stuffing attack. Despite failed logins being logged, no alerts were triggered during the timespan of the attack. As a reaction to user complaints, API logs were analyzed and the attack was detected. The company had to make a public announcement asking users to reset their passwords, and report the incident to regulatory authorities.

4. A children’s health plan provider’s website operator couldn’t detect a breach due to a lack of monitoring and logging. An external party informed the health plan provider that an attacker had accessed and modified thousands of sensitive health records of more than 3.5 million children. A post-incident review found that the website developers had not addressed significant vulnerabilities. As there was no logging or monitoring of the system, the data breach could have been in progress since 2013, a period of more than seven years.

5. A major Indian airline had a data breach involving more than ten years’ worth of personal data of millions of passengers, including passport and credit card data. The data breach occurred at a third-party cloud hosting provider, who notified the airline of the breach after some time.

6. A major European airline suffered a GDPR reportable breach. The breach was reportedly caused by payment application security vulnerabilities exploited by attackers, who harvested more than 400,000 customer payment records. The airline was fined 20 million pounds as a result by the privacy regulator.

7. An open source project forum software run by a small team was hacked using a flaw in its software. The attackers managed to wipe out the internal source code repository containing the next version, and all of the forum contents. Although source could be recovered, the lack of monitoring, logging or alerting led to a far worse breach. The forum software project is no longer active as a result of this issue.

8. An attacker uses scans for users using a common password. They can take over all accounts using this password. For all other users, this scan leaves only one false login behind. After some days, this may be repeated with a different password.

9. A major US retailer reportedly had an internal malware analysis sandbox analyzing attachments. The sandbox software had detected potentially unwanted software, but no one responded to this detection. The sandbox had been producing warnings for some time before the breach was detected due to fraudulent card transactions by an external bank.

10. Application logs are turned off. When a breach attempt occurs, security teams are unable to determine who accessed the app and what they tried to do.

11. A business-critical application stops functioning following a change. Since multiple changes have occurred, each resulting in an application update, it is challenging to find which developer introduced the particular change that caused the issue. Developers have to review each application version manually to locate the problematic version. Since each application “save” translates to an update, the number of updates would make a manual process prohibitively expensive and time-consuming. On some platforms, developers can only review the application’s current version, so they won’t be able to find or revert to a stable version.

12. A developer builds an automated process to load data into a financial system to complete order processing. The process handles around one thousand transactions per week with a 97% success rate. The 3% that fail are processed manually off of a daily Failed Transaction email. The Failed Transaction email fails to get sent for a few days before its absence is noticed. Due to poor logging, there is no easy way to find the failed records. Instead, 100% of all transactions have to be investigated to find the failures, resulting in a significantly larger task.

13. An application to process credit card payments at a conference is created. As part of its creation, a detailed log file is created to track the transactions and stored on a shared network drive. The logging includes records of the credit card details. A user browsing the network drive discovers this file and is able to obtain all of the credit card data.

Keep Event Log Data ‘Hot’ When Possible

14. When retaining logs, it’s important to consider where you’ll be storing them. The average time to initially detect a breach is nearly 300 days, according to IBM. Keeping security event logs in hot storage, however, can aid organizations in investigation once a breach is discovered. Hot storage can be accessed quickly, while cold ‘cheap and deep’ storage is archived and rarely accessed. Not only is hot storage important from a cybersecurity perspective, but it’s a requirement in M-31 memorandum

15. In July 2021, staff members from a university requested management, “Some files are missing in the Shared Drive”. The IT team, using their SIEM, discovered within 2 hours that the staff member had deleted the files from their own computer with their own user account back in January 2021. They generated the necessary reports and presented them to the management. That’s six months ago. If they hadn’t kept this information live for at least 6 months, they might not have been able to find it even with days of effort in the archives!

16. Another Scenario from a University: In their SIEM solution with an average of 3000 EPS traffic, they needed the access list for 2 specific IPs within the last year. Despite the elapsed one-month period, this report was still not ready using their SIEM. The way to avoid this situation is to keep logs live for at least the last year.

17. A list of devices accessing the X target IP in the last 12 months will be needed in many situations. Just looking at the famous attacks detected in the last few years is enough to understand why.

18. In the last year, which databases did SA user (root) log into and what queries did they run?

19. All actions performed by service account X in the last 6 months (including Firewall)

20. List of machines not generating traffic in the last month

21. List of those who accessed tables X or Y (tables containing personal data) within the last 12 months or files A or B on the file server (files containing personal data), and then the list of all URLs and IPs accessed by users in this list in the last 12 months, including port information, and finally, the list of other users who accessed these IPs or URLs in the last 12 months.

22. A site used for an attack that only emerged/decrypted much later, for example, deftsecurity[.]com (as in the SolarWinds incident). The list of those who accessed it from your system in the last 6, 12, 18 months.

23. List of unused firewall rules in the last 6 months

24. For example, the question of whether anyone from your company has accessed the avsvmcloud.com site associated with the SolarWinds hack in the last 6, 12, 18 months is critical. Today it’s SolarWinds, tomorrow it could be something else.

25. List of those who made DNS queries to SolarWinds servers in the last 12 months

26. If the same users have requested exceptions in AV or Endpoint Security in the last 6 months, it’s suspicious; the list.

27. If the same users have had problems receiving updates in the last 6 months, it’s suspicious; the list.

28. List of activities for a suspected user in the last 12 months

29. How many times has any user downloaded a .mov file in the last 6 months?

30. List of users who haven’t accessed the file server in the last 12 months

31. List of servers closed in the last 6 months

32. What are the numbers of hitting the monthly threat intelligence list in the last 6 months?

33. List of users who hit the monthly threat intelligence list the most in the last 6 months

34. While accessing the same target domain in the last 6 months, which users were blocked by the firewall?

35. List of devices accessing the target IP avsvmcloud[.]com, for example, in the last 12 months, with source port and username.

36. Starting today, going back 180 days and listing unsuccessful sessions made to the firewall on that day can be a natural investigation report if you feel the need to search for suspicious firewall activity.

37. Listing all unsuccessful sessions to the firewall made in the last 180 days is also a very natural research report.

38. Starting from today, can you retrieve the list of failed sessions on the firewall within 3–5 minutes, going back 180 days?

39. Can you retrieve the list of all failed firewall sessions in the last 180 days within a few minutes?

40. Can you retrieve the list of devices that accessed the target IP address “X” within the last 12 months.

41. Can you retrieve the list of user access reports within the last 12 months for your user, which were not blocked by the firewall but were in the threat intelligence list, including the original log from the firewall.

42. Which databases did SA log in to and what queries did they execute in the last year?

43. Can you retrieve the list all activities performed by service account “X” in the last 6 months, including the firewall?

44. Can you retrieve the list of machines that have not generated any traffic in the last month.

45. List of users who accessed tables X or Y (containing personal data) or files A or B on the file server (containing personal data), followed by the list of all URLs and IP addresses accessed by these users in the last 12 months, including port information. Finally, provide a list of other users who accessed these listed IPs or URLs in the last 12 months.

46. Example of a site that was discovered/revealed to be used for an attack long after: deftsecurity[.]com (similar to the SolarWinds incident). Provide a list of users from your system who accessed this site in the last 6 months.

47. Number of occurrences of hitting the monthly threat intelligence list in the last 6 months.

48. List of users who hit the monthly threat intelligence list the most in the last 6 months.

49. Users who were blocked by the firewall while accessing the same target domain simultaneously.

50. List of devices, along with source port and username, that accessed the sample IP address avsvmcloud[.]com in the last 12 months.

Threat Detection

51. Detect if the hourly login fail/login success authentication rate exceeds 3%.

52. Generate an alert if a file containing personal data is copied to a shared path that is accessible to everyone, or if personal data is added to an existing file in the shared path that is being edited.

53. Identify if the hourly HTTP/DNS ratio is less than 1.

54. If the total number of login events during business hours is at least 3% higher than the total number of users, and more than 5% of these events are generated by the same repeating users, notify.

55. If the data row added to the monitored table in the last hour is anomalous compared to log history, then alert

56. Raise an alert if a machine is detected with a virus by the AV, and within 5 minutes, another machine is logged into, followed by the blocking of the logged-in machine within 5 minutes.

57. Alert if the same user logs in to multiple machines simultaneously, except for admin users or those in the whitelist.

58. If a user triggers a failed session event and repeats it between the 5th and 10th minute, with a 5–10 minute interval, raise an alert. (Operator support is needed for the condition where Event A does not occur within the first 5 minutes but occurs between the 5th and 10th minute.)

59. Raise an alert if the same user establishes a VPN connection to one machine and simultaneously performs a local login to another system.

60. Raise an alert if a user attempts three login failures within 30 minutes without any successful logins, especially if they are not an administrator or in the whitelist.

61. Raise an alert if a user downloads more than 50 MB in one minute or uploads more than 250 MB to the same target IP/Domain within 10 minutes, especially if the URL ends with zip, exe, or dat.

62. Raise an alert if a process is started on one machine and within 5 minutes, the same process starts on another machine with the same path as the first machine, and if one of these machines is blocked by the firewall within the next 5 minutes, and the users are different.

63. Raise an alert if a user creates a new user, and within 5 minutes, the created user performs a login failure, followed by the creation of another user by the user who created the initial user.

64. Do not generate an alarm if a user is created but deleted within 10 minutes without being used. However, raise an alert if the user is used before being deleted or deleted within the same day.

65. Raise an alert if the same IP first logs into a Linux server, then logs into a Windows server, and subsequently starts/stops a service on either of these servers.

66. Raise an alert if the same user attempts unsuccessful logins on two different machines within 15 minutes, and if within 5 minutes after the second unsuccessful login, there is an access request to an IP in the threat intelligence list from one of these machines.

67. Raise an alert if the same user attempts at least 3 unsuccessful logins within 10 minutes without any successful logins in between.

68. If multiple usernames are subjected to brute force attacks or if, within 15 minutes of a brute force attack, one of the machines that was subjected to the attack successfully logs in (excluding machines where brute force was not attempted to avoid false positives), raise an alert. The expected outcome of this rule is as follows:

✔️ Notify about users conducting brute force attacks

✔️ Create a list of source IPs from machines where brute force was attempted

✔️ Report machines where successful logins occurred

✔️ Report the usernames used during successful logins

The reason for including this rule here is:

🔔 It can perform these four tasks simultaneously without using lists.

🔔 It can detect multiple usernames in a single step.

🔔 In large networks with thousands of devices, only the 100 devices that are subjected to brute force need to be monitored, reducing false positives.

🔔 It can identify the username that successfully cracks the password as a result of brute force.

🔔 It can achieve the above four points in a single rule, written within 3–5 minutes using the GUI.

69. DGA detection (ML)

70. If critical processes that should be secure in your network (winlogon.exe, svchost.exe, explorer.exe, lsm.exe, lsass.exe, csrss.exe, taskhost.exe, wininit.exe, smss.exe, smsvchost.exe) start processes that may be deceptive in terms of their names (could be perceived as the same by human eyes) and these processes are not among the allowed processes, raise an alert.

71.Raise an alert if there are users who have not logged in for more than 3 months.

72. Raise an alert if a user has been created and hasn’t been used for 72 hours.

73. Identify machines or users that have been idle for more than 30 days (40 days, 60 days, 90 days, … 365 days) and then appear on the network, and shut down the machine and disable the user.

74. If a user has not used VPN for at least 2 weeks (20 days, 30 days, 40 days, … 365 days) and within a short period of time, they perform remote interactive logon on more than one workstation, raise an alert.

75. If a port, other than standard proxy target ports, which has not been used for at least 30 days or more (40 days, 60 days, 90 days, … 365 days), starts to be used again and this port is greater than port 1024, and within 5 minutes, multiple requests are made to different destination IP addresses with requestMethod=POST, raise an alert. (Threshold, but not all SIEMs support it, so it is included here.)

76. If the same user performs two or more unsuccessful logins to the same machine within a day without any successful logins, identify and raise an alert.

77. If a locked user remains unlocked for more than 72 hours, raise an alert.

78. If an authentication error occurs simultaneously from the Oracle database user interface (Oracle Management Studio) and the console (SQL*Plus), raise an alert.

79. Learn the user agent information of each user, then warn if any other user agent information is detected.

80. Learn users’ campus (University or distributed locations) information, then warn if any other campus location information is detected.

81. Learn users’ flat number or building location information, then warn if any other location information is detected.

82. Detects when a user is still logged on but someone else logs on with a different IP using the same username to any machine

83. After the Antivirus system detects the Virus on a machine, notify if a process starts on that machine before it is deleted, and also alert if this process occurs more than two times in 15 days.

84. Alert when a user is still logged on but someone else logs on with a different IP using the same username to any machine

85. Warn if the same user tries more than 3 unsuccessful sessions on the same machine for three days without any successful login

86. Detect when multiple logins are occurring with the same username but from different IP addresses.

87. Detect first Access to Critical Assets

88. Detect user Access at Unusual Times

89. If the user whose last login event is Authentication-Fail, and this user fails again after at least 5 minutes without any successful login.

90. Create rules around logins that hop to different points of the network after a failure occurs or use the same credentials across multiple assets. (Automated login attempts)

91. Alert, while a user’s VPN connection is in progress, a new VPN connection request is received with the same username.

92. Warn if a logged-off user evet detected without a logon event.

93. If a user tries to log in to another machine while logged in on one another machine, alert.

94. Notify if the mac address of the server changes.

95. Monitor each users VPN connections from unauthorized locations. (While sales personnel can travel to certain countries in the world, the locations where the developers and accounting personnel work are fixed)

96. Detecting login attempts to a database server from an unauthorized IP address or users.

97. If the mail gateway discovers an infected email, add the sender’s IP address to the list of suspect IPs, add the email’s attachment file name to the list of suspected files, and alert any access from this IP (directly blocking is risky on the firewall), additionally, alert if there are any file accesses with the same filename until mail gateway categories this IP clean.

98. Report all access to an external IP categorized as suspicious by the URL proxy until it is again removed from the suspicious category by the URL proxy. Do not report after leaving the suspect category.

99. If an Antivirus System detects a virus on a machine, add the IP of that machine to the infected machines list, and add the current user to the suspicious users’ list and notify that IP and user activities until the Antivirus send the information that it has been cleaned for that machine.

100. When multiple failed login attempts from a single IP address occur within a short time frame, add the IP to a list of suspicious IPs’ and users to the suspicious users’ list. Then alert all the events of that IP address or users until a successful login event comes from that IP or one of the users. After a successful login event does not create an alert for that IP or user (successful event IP or user).

101. When a VPN connection is detected from a high-risk area, alert all events of this user until a VPN connection from a low-risk location with the same user. Do not alert after this VPN connection.

102. if multiple users accessing sensitive information (Monitoring Cloud Environments or File Servers) at the same time, alert. If someone accessed a file, then 5 minutes later, another user accessed the same file, this event flow will not generate an alert.

103. If multiple devices on the network are infected at the same time (At least 3–5 seconds)

104. If multiple users modifying system settings at the same time, alert

105. If a new device is being added while (at the same time) the firewall rules are being changed, alert.

106. Detect, if multiple users are accessing suspicious websites at the same time.

107. A user accesses a database, followed immediately by a user uploading a large file to a cloud storage service.

108. A user accesses a sensitive document, followed immediately by a user connecting to a VPN

109. Alert if more than half of the queries in the last hour to a selected table from the database belong to the same user.

110. If there are more than 15,000 events from at least 50 unique IPs within 3 minutes, and these events belong to a maximum of 10 different categories, notify “So, these 15,000 events are being grouped into a maximum of 10 categories”.

111. Alert if the ratio of unsuccessful sessions to successful sessions in the last hour exceeds 5%.

112. Alert if more than half of the data added in the last hour to a selected table from the database belongs to the same user.

113. If the data row added to the monitored table in the last hour is 10% more than the previous hour, then alert

114. Detect anomalies between the number of inserts in the logs and the number of rows added to the monitored critical table.

115. Detecting data loss: Monitor the logs of a database table for any anomalies between the number of inserts in the logs and the number of rows added to the table. If the number of inserts in the logs is significantly lower than the number of rows added to the table, this could indicate potential data loss or deletion.

116. Detecting database performance issues: Monitor the logs of a database table for any discrepancies between the number of inserts in the logs and the number of rows added to the table. If the number of inserts in the logs is significantly higher than the number of rows added to the table, this could indicate database performance issues, such as slow or failing queries.

117. Detecting unauthorized data access: Use logs to track access to a sensitive database table and compare the number of inserts in the logs to the number of rows added to the table. If there is a significant difference between the two, generate an alert to indicate potential unauthorized access to the data.

118. Detecting data tampering: Monitor the logs of a database table for any anomalies between the number of inserts in the logs and the number of rows added to the table. If there is a discrepancy between the two numbers, generate an alert to notify security teams of potential data tampering.

119. Generate an alert if a file containing personal data is copied to a shared path that is accessible to everyone, or if personal data is added to an existing file in the shared path that is being edited.

120. Find the change in the 90th percentile of incoming traffic volume per source IP between two time periods

121. Detect spikes in incoming traffic volume per source IP

122. Detect spikes in outgoing traffic volume per destination IP

123. Detect abnormal increase in the number of connections per source IP

124. Detect abnormal increase in incoming traffic volume per source IP using percentile

125. Detect abnormal increase in outgoing traffic volume per destination IP using percentile

126. Find the standard deviation of incoming traffic volume per source IP

127. Find the average number of incoming packets per destination IP

128. Find the 90th percentile of incoming traffic volume per source IP

129. Find the 75th percentile of outgoing traffic volume per protocol

130. Find the average number of connections per source IP, broken down by connection type

131. Find the standard deviation of incoming traffic volume per destination port

132. Find the average number of packets per protocol and destination IP

133. Count the number of events by event type

134. Find the top 10 destination IPs that have the highest number of failed login attempts

135. Find the top 10 source IPs that have generated the highest volume of traffic

136. Find the top 10 source IPs that have generated the highest number of events

137. Find the top 10 destination IPs that have the highest number of events per protocol

138. Find the top 10 source IPs that have generated the highest volume of incoming traffic

139. Find the top 10 destination IPs that have received the highest volume of outgoing traffic

140. Find the top 10 source IPs that have generated the highest number of incoming packets

141. Find the top 10 destination IPs that have received the highest number of outgoing packets

142. Find the top 10 source IPs that have generated the highest number of incoming connections

143. Find the top 10 destination IPs that have received the highest number of outgoing connections

144. Find the top 10 protocols that have generated the highest volume of traffic

145. Find the top 10 protocols that have generated the highest number of packets

146. Warn if a user does something they’ve never done before

147. Warn if a user who has not had a VPN for at least 15 days (20,30,40…265 days) has remote interactive logon on more than one (1) workstation in a short time.

148. No Activity for more than 60 Days — This account has not logged in for over 60 days

149. Password changes for the same user more than 3 within 15 days

150. Warn if a user has visited the malicious categories on the proxy at least once a day for a week. (Bot Networks, Uncategorized, Malware, Spyware, Dynamic Dns, Encrypted Upload)

151. If there is a port usage, which is very rare

152. Detect the ratio of login success versus failure per user anomaly

153. Monitors all the logins and access for nonworking hours.

154. Checks the geo location to Find unusual behavior (Never seen before)

155. Warn if the time between two logins failed events of the same user is less than 1 minute

156. Warn if the VPN user has not made any VPN connection in the last week

157. Warn if the time between two login events of a non-admin user is less than 5 minutes

158. Mail Masquerade Detection Warn if an e-mail was received from e-mail addresses similar to the original e-mail address like ali.veli@citibank.com and ali.veli@citibαnk.com

159. Masquerading Detection Detect system utilities, tasks, and services Masquerading. (T1036.003 Rename System Utilities Rename, T1036.004 Masquerade Task or Service)

160. Hunting malware and viruses by Detecting random strings

161. if the entropy of a file or directory is significantly higher than the baseline, it could indicate the presence of malware or unauthorized changes.

162. Processes Matching or Similar to System Processes in Unexpected Directories

163. Account Created with Name Similar to “Admin”

164. Account Created with Name Similar to “Administrator”

165. Account Created with Name Similar to the local service account naming convention

166. Newly-Registered Domains Visited (requires WHOIS enrichment)

167. Identifying Benign Websites Top 1 million Domains. If a domain has been created in the last 24 hours and this domain is in the top 1 million (Cisco Umbrella 1 million, , https//majestic.com/reports/majestic-million, https//tranco-list.eu/ , https//www.domcop.com/top-10-million-websites) list and not in our Whitelist, block it via NAC or Firewall.

168. Detect if the total (upload+download) amount of traffic for each user is abnormal based on the last week.

169. Detect if the same activity occurred for the last week/month for the same user or not

170. Warn if a user accesses a URL that they haven’t accessed in the past week/month

171. Alert will be triggered when there are more than 3x admin logins than yesterday.

172. Alert will be triggered when there are more NX domain name responses than last week.

173. Allow/Block Ratio per System/User

174. GET/POST Ratio per System/User

175. Up/Down Bytes Ratio per System/User

176. Auth/Failed Auth per User

177. If the 90th percentile of network traffic from a specific IP address exceeds a certain threshold. This would mean that 90% of the network traffic from that IP address is below the threshold, and only 10% is above it. This rule can be used to Detect abnormal behavior, such as a DDoS attack, which would cause a spike in traffic from a specific IP.

178. Trigger an alert if the 99th percentile of authentication failure rate for a specific user or group of users exceeds a certain threshold.

179. Alert if the mean value of the authentication failure rate for a specific user or group of users exceeds a certain threshold.

180. Alert if the standard deviation of the login times for a specific user exceeds a certain threshold. This means that the login times are more variable than usual, which could be a sign of abnormal behavior, such as a compromised account being accessed from different locations or at different times.

181. If the standard deviation of the number of failed login attempts for a specific user exceeds a certain threshold. This means that the number of failed login attempts is more variable than usual, which could be a sign of abnormal behavior, such as a brute force attack being launched from different IP addresses or at different times.

182. Identify a suspicious command that deletes shadow copies has been executed for process vssadmin.exe

183. Identify an employee who is accessing sensitive files outside of their normal job responsibilities, or who is sending large amounts of data outside of the organization.

184. Identify an account that has been accessed from multiple locations or devices at unusual times, or that has been used to access sensitive data or systems that the user does not normally access.

185. Identify a user who is attempting to access privileged accounts or systems without authorization, or who is attempting to use a privileged account to access sensitive data or systems.

186. Identify a privileged user who is accessing sensitive data or systems outside of their normal job responsibilities, or who is using privileged access to perform unauthorized actions.

187. Identify a privileged user who is sharing their account credentials with others, or who is using privileged accounts in an insecure way.

188. Identify a privileged user who is logging in from unusual locations or at unusual times, or who is using privileged accounts to access sensitive data or systems that they do not normally access.

189. Identify a user who is copying large amounts of data to an external device or cloud storage account, or who is emailing sensitive data to a personal email account.

190. Identify a user who is accessing sensitive files outside of their normal job responsibilities or who is accessing files that they haven’t accessed before.

191. Identify a user who is modifying sensitive files outside of their normal job responsibilities or who is modifying files that they haven’t modified before.

192. Identify a user who is encrypting large numbers of files or who is encrypting sensitive files, which might indicate a security incident.

193. Identify a user who is deleting large numbers of files or who is deleting sensitive files, which might indicate a security incident.

--

--