Anonymizing IP Addresses: How to Log Attacks Without Storing PII
In May 2023, the Finnish Data Protection Authority fined a company EUR 750,000 for storing raw IP addresses in security logs without a valid legal basis. The company argued the logs were necessary for intrusion detection. The regulator did not disagree with that premise. They disagreed with the implementation. The logs contained full, unhashed IP addresses—and that made them a personal data processing operation under GDPR.
This is the tension every developer and sysadmin faces: you need attack data to defend your systems, but storing that data can make you legally liable. Security logging without PII protection is a compliance landmine. Skip logging entirely and you are flying blind. Log everything raw and you are one audit away from a fine.
There is a middle path. You can hash IP addresses with a rotating salt, preserving the ability to detect patterns and repeat offenders while making it mathematically infeasible to recover the original address. This article walks through exactly how to do that in PHP, with production-ready code you can drop into a WordPress plugin or any server-side application.
The Problem: Security Logging vs. Privacy Regulation
IP Addresses Are Personal Data
This is settled law in the EU. The Court of Justice of the European Union ruled in Breyer v. Germany (C-582/14, 2016) that dynamic IP addresses constitute personal data when the entity processing them has the legal means to identify the natural person behind the address. For most website operators—who can cross-reference IPs with form submissions, user accounts, or ISP cooperation—that threshold is met.
Under GDPR Article 4(1), personal data means “any information relating to an identified or identifiable natural person.” An IP address, combined with a timestamp and a URL, is enough to identify someone. Full stop.
But You Still Need Logs
Without security logs, you cannot:
- Detect brute-force patterns across multiple form submissions or login attempts.
- Correlate attacks originating from the same source over time.
- Generate threat intelligence about which networks or ASNs are producing malicious traffic.
- Demonstrate due diligence to regulators, clients, or partners who ask how you handle abuse.
The question is not whether to log. The question is what to log.
The Compliance Trap
Most WordPress security plugins take one of two bad approaches:
- Log everything raw. Full IP addresses, user agents, timestamps, referrer URLs—all stored indefinitely in
wp_optionsor a custom table. This is a GDPR violation waiting to happen. - Log nothing. Some plugins avoid the problem entirely by not recording any request metadata. This satisfies privacy requirements but leaves you unable to detect coordinated attacks or repeat offenders.
Both approaches fail. PII protection does not mean destroying useful data. It means transforming data so it remains operationally useful without being personally identifiable.
Technical Deep Dive: Hashing IP Addresses with a Rotating Salt
The Core Concept
Instead of storing 203.0.113.42, you store a]b7f2e9...c3d1. The hash is deterministic—the same IP always produces the same hash within a given time window—so you can still detect repeat offenders. But the hash is irreversible. You cannot recover 203.0.113.42 from the hash output, which means the stored value is no longer personal data under most interpretations of GDPR.
The key mechanism is a keyed hash with a rotating salt:
identifier = sha256(ip_address + daily_salt)
The daily salt rotates every 24 hours (or whatever window you choose). This means:
- Within a single day, the same IP always hashes to the same value. You can count how many times that source hit your form, detect patterns, and flag repeat offenders.
- Across days, the hash changes. You cannot correlate Monday’s attacker with Tuesday’s attacker by hash value alone. This limits the re-identification risk, which is what GDPR cares about.
Why Not Just Truncate?
A common alternative is IP truncation—replacing the last octet with zero (203.0.113.0). Google Analytics uses this approach. It is simple, but it has real limitations:
- Collision rate is high. A /24 block contains 256 addresses. If you truncate to
203.0.113.0, you lose the ability to distinguish between 256 different sources. For security logging, this resolution is often too coarse to be useful. - The remaining octets are still partially identifying. A truncated IP combined with a timestamp and geolocation data may still qualify as personal data under GDPR.
- It is reversible in practice. If you know the truncated IP and the rough time window, the search space is only 256 addresses. That is trivially brute-forceable.
Hashing with a salt avoids all three problems. The output is fixed-length, uniformly distributed, and computationally infeasible to reverse without the salt.
Why a Rotating Salt Matters
A static salt (one that never changes) creates a permanent mapping between IP addresses and hash values. If the salt is ever compromised—through a database breach, a backup leak, or an insider threat—an attacker can precompute a rainbow table for the entire IPv4 address space (roughly 4.3 billion entries). With modern hardware, that table can be built in hours.
A rotating salt limits the blast radius. Even if one day’s salt is compromised, only that day’s hashes are vulnerable. Previous and future days remain protected.
The rotation period is a tradeoff:
| Rotation Period | Pattern Detection Window | Re-identification Risk |
|---|---|---|
| 1 hour | Very short | Very low |
| 24 hours | Sufficient for most attack patterns | Low |
| 7 days | Good for slow-and-low attacks | Moderate |
| Never (static) | Unlimited | High |
For most WordPress security logging use cases, 24 hours is the sweet spot. It gives you enough time to detect a bot hammering your contact form throughout the day, without creating a permanent identifier that could be used for long-term tracking.
The Solution: Production-Ready PHP Implementation
Basic Hashed IP Function
Here is the minimal implementation. It takes an IP address, combines it with a daily salt derived from a secret key, and returns a SHA-256 hash.
/**
* Generate a privacy-safe identifier from an IP address.
*
* Uses a daily-rotating salt so the same IP produces the same hash
* within a 24-hour window, enabling pattern detection without
* storing PII.
*
* @param string $ip The raw IP address (v4 or v6).
* @param string $secret A site-specific secret key (use wp_salt() in WP).
* @param string $period Optional. Date-based rotation key. Default: today's date.
* @return string 64-character hex string (SHA-256).
*/
function anonymize_ip( string $ip, string $secret, string $period = '' ): string {
if ( $period === '' ) {
$period = gmdate( 'Y-m-d' );
}
// Normalize IPv6 to full form to avoid variant representations
if ( filter_var( $ip, FILTER_VALIDATE_IP, FILTER_FLAG_IPV6 ) ) {
$ip = inet_ntop( inet_pton( $ip ) );
}
$salt = hash_hmac( 'sha256', $period, $secret );
return hash( 'sha256', $ip . $salt );
}
Usage is straightforward:
$secret = defined( 'AUTH_KEY' ) ? AUTH_KEY : 'fallback-secret-change-me';
$hashed = anonymize_ip( $_SERVER['REMOTE_ADDR'], $secret );
// Result: "a1b7f2e9d4c8...f3e1" (64 hex characters)
// Store in your log table
$wpdb->insert( $log_table, [
'source_hash' => $hashed,
'event_type' => 'spam_submission',
'created_at' => current_time( 'mysql', true ),
] );
Key Design Decisions Explained
Why hash_hmac for the salt derivation? Using HMAC to derive the daily salt from the secret key prevents length-extension attacks and ensures the salt is uniformly random even if the date string is predictable (which it is—everyone knows today’s date).
Why AUTH_KEY as the secret? WordPress generates a unique AUTH_KEY during installation and stores it in wp-config.php. It is site-specific, sufficiently random, and already protected by file permissions. Using it avoids the need to manage a separate secret.
Why SHA-256? It is fast enough for per-request computation, widely supported in PHP without extensions, and produces a 256-bit output that eliminates practical collision risk. You do not need bcrypt or Argon2 here—those are designed to be slow, which is desirable for password hashing but counterproductive for per-request logging.
Detecting Repeat Offenders
Because the hash is deterministic within a 24-hour window, you can query your logs for repeat sources:
/**
* Count events from the same anonymized source within today's window.
*
* @param string $ip Raw IP address.
* @param string $secret Site-specific secret.
* @param string $table Log table name.
* @return int Number of events from this source today.
*/
function count_source_events( string $ip, string $secret, string $table ): int {
global $wpdb;
$hashed = anonymize_ip( $ip, $secret );
$today = gmdate( 'Y-m-d 00:00:00' );
return (int) $wpdb->get_var(
$wpdb->prepare(
"SELECT COUNT(*) FROM {$table}
WHERE source_hash = %s AND created_at >= %s",
$hashed,
$today
)
);
}
// Example: block if more than 20 submissions today
$attempts = count_source_events( $_SERVER['REMOTE_ADDR'], $secret, $log_table );
if ( $attempts > 20 ) {
wp_die( 'Rate limit exceeded.', 429 );
}
This gives you effective rate limiting and pattern detection with zero PII stored in your database. If a regulator asks you to produce all personal data associated with a specific individual, you can truthfully say: you do not store IP addresses. The hashes cannot be reversed, and the salt rotates daily, so even you cannot determine which hash belongs to which IP after the rotation window closes.
Handling Log Retention
Even hashed data should not live forever. Implement a retention policy that purges old entries:
/**
* Purge anonymized log entries older than the retention period.
*
* @param string $table Log table name.
* @param int $retention_days Number of days to retain logs.
* @return int Number of rows deleted.
*/
function purge_old_logs( string $table, int $retention_days = 30 ): int {
global $wpdb;
$cutoff = gmdate( 'Y-m-d H:i:s', strtotime( "-{$retention_days} days" ) );
return (int) $wpdb->query(
$wpdb->prepare(
"DELETE FROM {$table} WHERE created_at < %s",
$cutoff
)
);
}
// Hook into WordPress cron for automatic cleanup
add_action( 'wp_scheduled_delete', function () use ( $log_table ) {
purge_old_logs( $log_table, 30 );
} );
Thirty days is a reasonable default. It gives you enough historical data to spot trends and investigate incidents, while keeping your data footprint minimal. Adjust based on your compliance requirements—some industry regulations mandate longer retention, but for most WordPress sites, 30 days is more than sufficient.
The Complete Schema
For reference, here is a minimal table schema for privacy-safe security logging:
CREATE TABLE wp_security_log (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
source_hash CHAR(64) NOT NULL,
event_type VARCHAR(50) NOT NULL,
metadata TEXT NULL,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
INDEX idx_source_hash (source_hash),
INDEX idx_created_at (created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Notice what is not in this schema: no ip_address column, no user_agent column, no referrer column. The metadata field can store non-identifying event details (e.g., which form was targeted, what type of spam was detected), but it should never contain raw PII.
What This Means in Practice
With this approach, your security logs contain:
- A pseudonymous identifier (the hash) that lets you correlate events from the same source within a 24-hour window.
- An event type that tells you what happened (spam submission, failed validation, rate limit hit).
- A timestamp for chronological analysis.
- Optional metadata for forensic context (form ID, detection method, etc.).
Your logs do not contain:
- Raw IP addresses.
- User agent strings (which can be fingerprinting vectors).
- Any data that could directly or indirectly identify a natural person.
This distinction matters. Under GDPR Article 11, if you cannot identify a data subject from your records, several data subject rights (access, rectification, erasure) do not apply. Your logging infrastructure becomes dramatically simpler to operate and defend.
A Note on WordPress Anti-Spam Plugins
Most WordPress anti-spam solutions store raw IP addresses by default. Some store them in plaintext in the database. Others write them to log files that are never rotated or purged.
If you are running Contact Form 7 and looking for a plugin that takes PII protection seriously, Samurai Honeypot for Forms uses hashed, anonymized identifiers for its internal spam detection—no raw IP addresses stored, no third-party data sharing, no GDPR liability. It applies the same hashing approach described in this article to detect repeat offenders and coordinate rate limiting without retaining personal data.
The code snippets above are framework-agnostic. You can adapt them for any PHP application, any WordPress plugin, or any custom logging system. The principle is the same everywhere: hash before you store, rotate your salts, and purge on schedule.
Key Takeaways
- IP addresses are personal data under GDPR. Storing them raw in security logs creates compliance risk, regardless of your intent.
- Hashing with a rotating salt gives you deterministic identifiers for pattern detection without storing reversible PII. Use
sha256(ip + hmac_derived_daily_salt)as your baseline approach. - Do not truncate—hash. Truncation leaves partial identifiers that may still qualify as personal data and offers weak collision resistance for security use cases.
- Rotate the salt daily (or at your chosen interval) to limit re-identification risk. A static salt is a single point of failure.
- Purge old logs automatically. Even anonymized data should not be retained indefinitely. Thirty days covers most incident response scenarios.
- Design your schema around the constraint. If your database table has an
ip_addresscolumn, you have already made the wrong decision. Start withsource_hashand build from there.