Post-Incident Report: DDoS Attack on MabiPro Infrastructure
Posted at 03-20-26, 06:35 am Link | #1
Drahan GM

Posts: 2148
Joined: 02-06-17
Last post: 14 hours
Last view: 9 hours
Post-Incident Report: DDoS Attack on MabiPro Infrastructure

Date: March 6 to March 17, 2026
Duration: 11 days (intermittent), culminating in a sustained 4-hour attack
Impact: Degraded connectivity for all players; intermittent disconnections
Status: Resolved

What happened

Beginning on March 6, our game infrastructure experienced intermittent connectivity degradation that initially appeared to be routine network instability. Players reported sporadic login failures and consistent lag ingame, in addition to "pet lag", but the episodes were short-lived and difficult to reproduce. Over the following days, these incidents grew in frequency and severity.

On March 17, the attack escalated to a sustained assault combining two distinct vectors: a 15 Gbps volumetric flood targeting our network layer, and a sophisticated application-layer attack exploiting the game protocol's authentication handshake. The intermittent probing over the preceding 11 days had likely been reconnaissance: testing the infrastructure's response to different attack patterns before committing to a full-scale operation.

We believe the same attack vector was also used against the NA Mabinogi servers. The timing is consistent with the attacker using our infrastructure as a testbed to refine their techniques before targeting NA. Unlike our setup, the NA servers do not sit behind a reverse proxy layer, their game servers are directly exposed to the internet. While we experienced the attack as degraded performance and intermittent lag, the NA servers collapsed entirely under the same assault, resulting in extended downtime. The difference in outcome underscores the value of the proxy architecture: even with imperfect defenses at the time, the indirection layer absorbed enough abuse to keep us running.

The volumetric component was a 15 Gbps bandwidth saturation attack, high-throughput junk traffic aimed at exhausting the network capacity of our edge nodes. Our upstream ISP's scrubbing infrastructure absorbed nearly 99% of the flood, but approximately 930 Mbps of residual traffic was still passing through to Drei, enough to saturate its available bandwidth and degrade connectivity for legitimate players.

The application-layer component was more interesting, and more dangerous. This was the first time in MabiPro's history that we had observed an L7 attack targeting the game server directly.

Given that these attack methods are already being actively exploited against both our server and the NA servers, we don't believe there is any risk in publishing the technical details here. The attacker already knows what they're doing. Our hope is that by sharing exactly how these attacks work and how we mitigated them, other server operators can learn from our experience and harden their own infrastructure.




The application-layer attack: auth packet floods

Our game protocol requires a cryptographic handshake for every new connection: a seed exchange followed by AES cipher setup. After the handshake completes, the client sends an authentication packet (opcode 0x4E22) containing session credentials, which the server must decrypt, parse, and validate against the session store.

The attacker exploited this by opening approximately 500 concurrent connections per second, each performing the full handshake and submitting a fabricated 0x4E22 auth packet before immediately disconnecting. Every fake connection forced the server to:

1. Accept a TCP connection
2. Perform a full cryptographic key exchange
3. Set up an AES cipher context
4. Decrypt the incoming packet
5. Parse the packet body and header
6. Look up the (nonexistent) session ID
7. Reject the auth and clean up resources

Because auth processing shared the same pipeline as live game traffic, the queue of pending operations grew faster than it could be drained. The database query for each session lookup was the most expensive part of the pipeline, and 500 fake lookups per second starved the DB connection pool for legitimate requests.

The attack required no special access or credentials: the authentication packet structure is deterministic from the client binary, and no rate limiting existed in the connection acceptance path.




Why existing defenses didn't catch it

Our proxy architecture routes all player traffic through a layer of stateless HA edge nodes before it reaches the game server. This design is specifically intended to absorb abuse. However, at the time of the attack, the edge nodes had no per-IP admission control. Every connection, regardless of origin, was accepted and given the full handshake treatment.

The proxy was doing its job, shielding the game server. But it was doing equal work for attackers and legitimate players. Under load, this meant attackers could monopolize proxy resources at will.




Code audit findings

Following the incident, we conducted a full audit of the proxy's connection handling. Beyond the auth flood vector, we identified several additional vulnerabilities:

Integer underflow in packet length validation

The proxy validated incoming packet lengths with a minimum threshold of 6 bytes. However, the protocol's header offset is 10 bytes for game connections. When the header offset is subtracted from the declared packet length to determine body size, a packet declaring a length between 6 and 9 causes an unsigned integer underflow:

declared_length = 7 header_offset = 10 body_size = uint32(7) - uint32(10) = 4,294,967,293

This creates a 4 GB read limit on the socket. In practice, the goroutine simply blocks waiting for data that never arrives - functionally equivalent to a connection slot leak rather than a memory explosion. Still, it's an unnecessary resource hold that compounds under attack conditions.

The game server itself shares this same vulnerability, but with a much worse outcome: the underflow causes an immediate crash of the server process. However, because our proxy validates and rejects these malformed packets before they ever reach the game server, the practical impact is minimal. An attacker would need to bypass the proxy entirely to exploit this on the game server directly.

No read timeouts

The proxy set no read deadlines on client connections. A client could connect, begin a handshake or partial packet transmission, and then go silent indefinitely. The serving goroutine would block on read() forever, holding its connection slot, cipher context, and associated memory.

This enables classic slowloris-style attacks: open hundreds of connections, send partial data, never finish. Each connection permanently consumes a server goroutine.

The real game server enforces aggressive read timeouts (killing slow senders after minimal delay), but the proxy sitting in front of it did not.

Oversized packet acceptance

The proxy's maximum accepted packet size was 10 MB. The actual game server rejects at a significantly smaller number. This mismatch meant attackers could force the proxy to allocate and process packets nearly 100x larger than any legitimate client would ever send, amplifying memory pressure during sustained attacks.

Cleanup storms on mass disconnect

When the attacker's tool terminated (or when we blocked a range), the OS sent TCP RST on all active connections simultaneously. Each RST triggered error handling, cipher cleanup, and goroutine teardown. Hundreds of concurrent cleanup operations created worse latency spikes than the sustained attack itself: a brief but intense amplification at the moment the attack stopped.




What we shipped

Per-IP rate limiting at the accept loop

The most impactful change. Before any cryptographic work begins: before seed exchange, before cipher setup, before a single byte of game protocol is parsed, the proxy now checks the source IP against three criteria:

Connection rate: Maximum 10 new connections per second per IP. A legitimate player reconnecting after a disconnect needs 1. A bot farm needs hundreds.
Concurrent connection cap: Maximum 50 simultaneous connections per IP. Normal gameplay uses 1-3 (game, login, messenger). No legitimate scenario requires 50.
Auth failure ban: After 10 failed authentication attempts, the IP is blocked for 5 minutes. Failed auths are tracked by monitoring the server's response to 0x4E22 (game auth) and 0x22 (login auth) -if the first byte of the response packet (0x4E23 / 0x23) is not 0x01 (success), the failure counter increments.

Connections rejected by rate limiting never reach the handshake phase. The cost to the proxy is a hash table lookup: effectively zero compared to the AES key exchange and database session lookup the attacker was trying to force.

Read deadlines

All game protocol connections now enforce a 30-second read timeout, refreshed after each complete packet. If a client goes silent mid-packet or mid-handshake, the connection is terminated.

Messenger protocol connections are exempt from read timeouts. The messenger service maintains long-lived idle connections by design -clients sit waiting for incoming messages with no regular keepalive cadence. Applying the same timeout would kill legitimate messenger sessions.

Packet length validation

The minimum packet length check was raised from 6 to the protocol's actual header offset (10 for game protocol, 6 for messenger protocol). This eliminates the unsigned integer underflow in the body size calculation.

Maximum packet size reduction

Reduced from 10 MB to 128 KB, aligned with the real game server's acceptance threshold. Oversized packets are rejected at the header parsing stage before any body data is read.

Horizontal scaling

Our proxy nodes are stateless. This means we can scale horizontally by adding more edge nodes.

A volumetric attack focused on a single node saturates that node's network link. But distributed across N nodes, each absorbs 1/Nth of the traffic. If a node goes down, our load balancer distributes players to surviving nodes. The game server is unaffected.

This forces the attacker into a losing proposition:

Concentrate fire on one node: That node degrades; others remain healthy. We spin up a replacement.
Spread across all nodes: Each node handles a fraction of the attack within its capacity. Per-IP rate limiting kills the L7 component at every node independently.

In practice, this is exactly what we did. After deploying the L7 patches to Drei, we brought up a second edge node (Omega) and distributed traffic across both. The remaining ~930 Mbps of volumetric traffic that slipped past upstream scrubbing was now split between two nodes, each well within its bandwidth capacity. Combined with the application-layer mitigations killing the auth flood at the accept loop, this completely resolved the issue. Players reported normal connectivity immediately after Omega was added to rotation.




Lessons learned

Rate limiting must happen before expensive operations, not after. The auth flood worked because every connection got the full cryptographic handshake before any validation occurred. Moving the admission check to the earliest possible point (raw TCP accept) reduced the cost of rejecting an attacker from "full AES key exchange" to "hash table lookup."

Protocol-aware proxies need protocol-aware limits. Generic TCP proxying wouldn't have caught the packet length underflow or the oversized packet issue. Understanding the game protocol's header structure, opcodes, and expected packet sizes allowed us to set precise limits that block abuse without affecting legitimate traffic.

Different protocol types need different timeout strategies. Applying a uniform read timeout across game and messenger connections would have fixed slowloris at the cost of breaking messenger functionality. The messenger protocol's long-idle-connection pattern is legitimate and must be accommodated.

Stateless edge nodes are operationally invaluable. The ability to add, remove, and replace proxy nodes without any coordination with the game server meant we could respond to the volumetric attack without any downtime.
Posted at 03-20-26, 01:13 pm Link | #2
Xypix

Posts: 3
Joined: 12-12-21
Last post: 7 hours
Last view: 7 hours
Who would win? A multi-billion dollar corporation, or one cybersecurity hobbyist?
(Spoiler alert, Nexon loses as usual)
Posted at 03-20-26, 01:33 pm Link | #3
Reva

Posts: 5
Joined: 07-06-18
Last post: 7 hours
Last view: 4 hours
This is awesome. I barely understand half of it, but amazing work <3.
Terms

Powered by mabi.pro v1.0034-arisa (View credits)
MabiPro is not associated with Nexon Co., Ltd. in any way shape or form.