What would be the point of having multiple rankings if they always
showed the same results?
But these are very different results:
none of the CBL top 10 show up in the PSBL top 10!
How can both the PSBL and CBL rankings be correct?
First, “correct” for such rankings does not mean completely accurate
and it does not mean completely precise:
no blocklist will ever detect every spam message emitted by every IP address.
Suppose even mighty NSA (No Such Agency) were to copy every bit that
passed over every major ISP in the U.S.
Even that would miss some bits emitted by for example an ISP in Vietnam
that spammed an ISP in India.
And what heuristics would mighty NSA use to detect all the spam from all
Would those heuristics happen to include the same one CBL is using
to detect the Kelihos rampage?
Would they include some further heuristic of which CBL has not yet thought
that would detect some other rampage?
Quite possibly yes and yes.
Any rankings of anything on the Internet are always approximate
records of hints and whispers of a constantly-shifting reality
that can never be completely pinned down.
Second, correct for rankings
means comparable among the ASNs ranked, so that they can be ranked.
In that sense, yes, both the PSBL and CBL rankings are correct:
they merely show different aspects of the spam symptom of defective
infosec for the ranked ASNs.
Third, any systematically ranked symptom of poor infosec is important.
Does any organization want any of its hosts to be spewing hundreds
of thousands of spam messages a day, as in those ASNs in the CBL top 10?
Does any organization want any of its hosts to be spewing enough
spam in aggregate to turn up in the PSBL top 10?
Besides, actually the CBL data does corroborate the PSBL data,
when viewed in another set of rankings.
Continue reading →
The source of the problem was embarassingly simple and easily fixed:
not enough inodes.
The CBL and PSBL data were affected differently because they arrive
We pick up from CBL daily a text summary table with a line per IP address.
We get from PSBL an NNTP feed of spam messages, each in its own file,
that we boil down to a summary.
So for CBL, we either got the whole file (most days of the month), or we didn’t
store it at all (8 and 11 September).
For PSBL, for each incoming message, we either stored it or we didn’t.
Which is why there are some days with PSBL data between 4 and 15 Sep,
but the volume is lower than usual.
The notice below the chart is dire because we prefer to be conservative
about these things.