Data storage issues in SpamRankings.net

Data storage issues led to loss of some incoming data for the September 2012 SpamRankings.net. Interestingly, the results seem almost normal anyway. Here is a speculation on why that can be.

Look just under any rankings chart for September 2012 and you’ll see this notice:

CBL dropouts 8,11 September 2012 were on our end.
PSBL data is unusable 4-15 Sep 2012 due to problems on our end.
September 2012 World All SpamRankings.net from CBL Volume
1 (2) AS 9829 BSNL-NIB India IN
2 (1) AS 25019 SAUDINETSTC-AS Saudi Arabia SA
3 (5) AS 6147 SAA Peru PE
4 (3) AS 8386 KOCNET Turkey TR
5 (4) AS 7643 VNPT-AS-VN Vietnam VN
6 (-) AS 9050 ROMTELECOM Romania RO

The source of the problem was embarassingly simple and easily fixed: not enough inodes. The CBL and PSBL data were affected differently because they arrive differently. We pick up from CBL daily a text summary table with a line per IP address. We get from PSBL an NNTP feed of spam messages, each in its own file, that we boil down to a summary. So for CBL, we either got the whole file (most days of the month), or we didn’t store it at all (8 and 11 September). For PSBL, for each incoming message, we either stored it or we didn’t. Which is why there are some days with PSBL data between 4 and 15 Sep, but the volume is lower than usual. The notice below the chart is dire because we prefer to be conservative about these things.

Yet the PSBL rankings show AS 9829 BSNL-NIB #1 worldwide just like

September 2012 World All SpamRankings.net from PSBL Volume
1 (2) AS 9829 BSNL-NIB India IN
2 (3) AS 4134 CHINANET-BACKBONE China CN
3 (6) AS 6147 SAA Peru PE
4 (4) AS 8386 KOCNET Turkey TR
5 (1) AS 25019 SAUDINETSTC-AS Saudi Arabia SA
6 (-) AS 9050 ROMTELECOM Romania RO
the CBL rankings. The PSBL chart shows BSNL-NIB #1 even during most of the affected timeframe. And as usual, the PSBL rankings emphasize China Chinanet-backbone, in a known peculiarity of the PSBL data. Then the PSBL rankings show Peru Peru’s AS 6147 SAA and Turkey Turkey’s AS 8386 KOCNET next, just like in the CBL rankings. What’s especially different is Saudi Arabia Saudi Arabia’s AS 25019 SAUDINETSTC-AS comes in #5 in the PSBL rankings instead of #2 on the CBL rankings. Looking at the CBL chart, that could easily be because part of the time CBL showed SAUDINETSTC still #1 was during the 4-11 Sept period when the PSBL data was almost completely lost. The PSBL chart does show that ASN #1 at the beginning of the month, just like the CBL chart, but then the outage starts, resulting in less SAUDINETSTC volume to add up for the whole month.

How can the PSBL rankings be even that close to normal? This would seem to be because the order in which PSBL messages were lost was essentially random. That plus most of the data for the month was outside the dropout period. For whatever reason, PSBL rankings mostly match CBL rankings for September.

-jsq