Usually I am going to blog about solutions and ideas we will use in a constructive way. In this blog post I am going to talk about a destructive solution, which will be hopefully rarely used. As we all know Azure Log Analytics is a great log and analytics platform, where we can insert data from basically any data source. We can utilize management solutions in Azure Monitor or use PowerShell to collect data and send it via OMSIngestionAPI module to Azure Log Analytics (ALA). If we want to use any other programming language there is also a description of the HTTP Data Collector API . Which ever way we choose, we always end up with data sitting in the cloud. Because of that, it could happen, that we will run into compliance / security issues and / or we will be haunted by the GDPR regulations.
Before we are going to collect data, it is probably best to think about WHAT we are going to collect BEFORE we are sending it to Azure Azure Log Analytics. Determine what you collect, will not only avoid any unwanted discussions with your chief information officer (CIO) / chief information security office (CISO), instead we will also benefit from saving money, because we consume less space. If you just need to upload some sensitive data to Azure Log Analytics, you might want to try to normalize the data, obfuscate or anonymize it. In my opinion “thinking before collecting” will save us a lot of trouble. The last and least preferred way of dealing with unwanted data in Azure Log Analytics, is to use the purge API implementation of Azure Log Analytics. In this blog post I will show you how we could use this API.
Here a good summary taken from the Microsoft documentation:
-
- Where possible, stop collection of, obfuscate, anonymize, or otherwise adjust the data being collected to exclude it from being considered “private”. This is by far the preferred approach, saving you the need to create a very costly and impactful data handling strategy.
- Where not possible, attempt to normalize the data to reduce the impact on the data platform and performance. For example, instead of logging an explicit User ID, create a lookup data that will correlate the username and their details to an internal ID that can then be logged elsewhere. That way, should one of your users ask you to delete their personal information, it is possible that only deleting the row in the lookup table corresponding to the user will be sufficient.
- Finally, if private data must be collected, build a process around the purge API path and the existing query API path to meet any obligations you may have around exporting and deleting any private data associated with a user.
The documentation is a good starting point what kind of data you may want to avoid collecting or remove from your workspace:
Log data
- IP addresses: Log Analytics collects a variety of IP information across many different tables. For example, the following query shows all tables where IPv4 addresses have been collected over the last 24 hours:
search * | where * matches regex @'\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}\b' //RegEx originally provided on https://stackoverflow.com/questions/5284147/validating-ipv4-addresses-with-regexp | summarize count() by $table
- User IDs: User IDs are found in a large variety of solutions and tables. You can look for a particular username across your entire dataset using the search command:
search "[username goes here]"
Remember to look not only for human-readable user names but also GUIDs that can directly be traced back to a particular user! - Device IDs: Like user IDs, device IDs are sometimes considered “private”. Use the same approach as listed above for user IDs to identify tables where this might be a concern.
- Custom data: Log Analytics allows the collection in a variety of methods: custom logs and custom fields, the HTTP Data Collector API , and custom data collected as part of system event logs. All of these are susceptible to containing private data, and should be examined to verify whether any such data exists.
- Solution-captured data: Because the solution mechanism is an open-ended one, we recommend reviewing all tables generated by solutions to ensure compliance.
Application data
- IP addresses: While Application Insights will by default obfuscate all IP address fields to “0.0.0.0”, it is a fairly common pattern to override this value with the actual user IP to maintain session information. The Analytics query below can be used to find any table that contains values in the IP address column other than “0.0.0.0” over the last 24 hours:
search client_IP != "0.0.0.0" | where timestamp > ago(1d) | summarize numNonObfuscatedIPs_24h = count() by $table
- User IDs: By default, Application Insights will use randomly generated IDs for user and session tracking. However, it is common to see these fields overridden to store an ID more relevant to the application. For example: usernames, AAD GUIDs, etc. These IDs are often considered to be in-scope as personal data, and therefore, should be handled appropriately. Our recommendation is always to attempt to obfuscate or anonymize these IDs. Fields where these values are commonly found include session_Id, user_Id, user_AuthenticatedId, user_AccountId, as well as customDimensions.
- Custom data: Application Insights allows you to append a set of custom dimensions to any data type. These dimensions can be any data. Use the following query to identify any custom dimensions collected over the last 24 hours:
search * | where isnotempty(customDimensions) | where timestamp > ago(1d) | project $table, timestamp, name, customDimensions
- In-memory and in-transit data: Application Insights will track exceptions, requests, dependency calls, and traces. Private data can often be collected at the code and HTTP call level. Review the exceptions, requests, dependencies, and traces tables to identify any such data. Use telemetry initializers where possible to obfuscate this data.
- Snapshot Debugger captures: The Snapshot Debugger feature in Application Insights allows you to collect debug snapshots whenever an exception is caught on the production instance of your application. Snapshots will expose the full stack trace leading to the exceptions as well as the values for local variables at every step in the stack. Unfortunately, this feature does not allow for selective deletion of snap points, or programmatic access to data within the snapshot. Therefore, if the default snapshot retention rate does not satisfy your compliance requirements, the recommendation is to turn off the feature.
But what can we do, if we need to purge data from Azure Log Analytics? There is an API implementation which we can use for example with PowerShell.
Before we can start using PowerShell to execute the POST API call we need to create a service principal in your Azure Active Directory tenant. This will be used to authenticate against the API and we will assign permissions to the service principal to purge data. How to create a service principal is well documented here.
I named my service principal LA Delete App…
When we have created our application, we need to make sure to get the application ID, which will use in the PowerShell script…
…and create a key…
…having this in place, we just need to assign permission to the application to delete / purge data from our workspace. Navigate to the workspace, where your data is located and then assign the Data Purger role to the previously created application…
Now we are ready to build the PowerShell script…
#### AUTHENTICATE #### # Import the ADAL module found in AzureRM.Profile Import-Module AzureRM.Profile # Set the client ID from the LA Delete App $clientId = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx" # Set the key from the LA Delete App $key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Select the ID of your AAD tenant $tenantId = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx" # Assign the subscription ID $subscriptionId = "xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" # Assign the resource group where your workspace lives in $rg = "max-rg" # Assign workspace name where you want to delete the data $ws = "max-ws" # We need to construct the authentication URL and get the authentication context $authUrl = "https://login.windows.net/${tenantId}" $AuthContext = [Microsoft.IdentityModel.Clients.ActiveDirectory.AuthenticationContext]$authUrl # Build the credential object and get the token form AAD $cred = New-Object Microsoft.IdentityModel.Clients.ActiveDirectory.ClientCredential $clientId,$key $result = $AuthContext.AcquireToken("https://management.core.windows.net/",$cred) # Build the authorization header JSON object $authHeader = @{ 'Content-Type'='application/json' 'Authorization'=$result.CreateAuthorizationHeader() } #### END AUTHENTICATE #### #### PURGE DATA #### # Build the URI according to the documented schema # https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.OperationalInsights/workspaces/{workspaceName}/purge?api-version=2015-03-20 $URI = "https://management.azure.com/subscriptions/${subscriptionId}/resourceGroups/${rg}/providers/ Microsoft.OperationalInsights/workspaces/${ws}/purge?api-version=2015-03-20" # The REST API takes a JSON Body according to this structure # Found here https://docs.microsoft.com/en-us/rest/api/loganalytics/workspaces%202015-03-20/purge $body = @" { "table": "Heartbeat", "filters": [ { "column": "TimeGenerated", "operator": ">", "value": "2019-01-29T19:00:00.000" } ] } "@ # Invoke the REST API to purge the data $purgeID=Invoke-RestMethod -Uri $URI -Method POST -Headers $authHeader -Body $body # Write the purge ID Write-Host $purgeID.operationId -ForegroundColor Green #### END PURGE DATA ####
…the script above is well documented, so you should get an idea what each part is doing. In the console it looks like this…
…this is the unique id of your current purge process running. If we want to get the status of your current purge, we need to run the following script in the same PowerShell session…
#### GET PURGE STATUS #### # Build URI to get the purge status # https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.OperationalInsights/workspaces/{workspaceName}/operations/{purgeId}?api-version=2015-03-20 $purgeURI="https://management.azure.com/subscriptions/$subscriptionId/resourceGroups/${rg}/providers/ Microsoft.OperationalInsights/workspaces/${ws}/operations/$($purgeID.operationId)?api-version=2015-03-20" Invoke-RestMethod -Uri $purgeURI -Method GET -Headers $authHeader #### END GET PURGE STATUS ####
…it will output the current status…
If you wait long enough it will eventually complete…
PLEASE READ THE FOLLOWING NOTES CAREFULLY!
- Although I created an Azure AD application to give permission to delete content from Azure Log Analyitcs. NEVER EVER DO THAT! Give selective permission to purge data and then remove the role from your application or user! As Microsoft says, this is a highly privileged operation!
- I don’t know the exact process behind purging data but I can imagine it will be a very expensive operation in terms of performance and resources in Azure Log Analytics.
- Deletes in Log Analytics are destructive and non-reversible! Please use extreme caution in their execution.
One question which pops up is, how long it will take to purge the data. Well, I did just few tests, but within approx. 1 day small amount of data is gone. According to Microsoft there is the following statement…
Summary:
Deleting data in Azure Log Analytics is not like cleaning up your file server! The operation and process will have massive impact on your workspace data and cannot be recovered. It is a better approach to think, which data you want to send to Azure Log Analytics, so that there will be no need to purge at all.
Above I just showed you how we can purge data from Azure Log Analytics with PowerShell. It is also possible to purge data from Application Insights. The approach is the same, you just need to change the URI get the API details here POST purge , GET purge status .
Stefan, this was really helpful but it had me a little puzzled for a bit. I believe you have an extra space after “providers/” in the definition of $URI and $PurgeURI. Other than that, this was perfect and exactly what I needed.
Thank you!
Hi,
is there any update on the script, would like to purge some tables from the OMS,
thank you