« Crazy, Amazing Bit Rot | Main | Malware Slaying »

Data, Data, Everywhere


Previously Kim Singletary wrote

“Having an automated, continuous monitoring and risk assessment framework seems to be the way to go in 2010″

I agree with Kim and I want to add one more straw to her pile of reasons – our data management problem.

Suppose your security system chooses 10,000 controls (vulnerability checks, scripts, signatures, etc – pick your preferred terminology) that can be applied to some example asset (a server, executive laptop, or whatever). That number could be smaller or it could be larger depending on the system but let’s say that 10,000 is the count after filtering out those controls that don’t apply (e.g., there’s no Apache server on that Windows 7 laptop).

Now, suppose that each control returns 100 characters of evidence data. That means you have 1,000,000 characters of data for one full assessment of one system. You are going to store those data in Unicode format because our company is international and hence so are our evidence data. That turns our 1,000,000 characters into 2,000,000 bytes of data.

“No big deal!”, you say, “I have thumb drives that are 32 gigabytes in size.” Hang on, I’m not done yet.

Remember, your company has many systems and you are executing this assessment automatically every day. The bytes add up fast. For example, if you have 500 systems then you are collecting and storing 1 gigabyte (1 billion bytes) of data per day for your one comprehensive assessment. That represents 5,000,000 (five million) unique data points.

You can see by inspecting the table below that the problem gets worse, way worse, the more systems you have. One consequence is that your database is rebuilding it’s indices (crucial for queries and reporting) on a continuous basis. You will need bigger iron and more sophisticated data management.


Asset Count Gb/Day Gb/Qtr Datum Count/Day
500 1 91 5,000,000
1000 2 183 10,000,000
5000 10 913 50,000,000
10,000 20 1825 100,000,000
50,000 100 9125 500,000,000
100,000 200 18250 1,000,000,000


Yes, there are ways to reduce the data but you either have to throw some out or implement complex algorithms to code the data in less space. The former has already been rejected by the business community and the latter makes products cost more to create and support.

Suppose you do manage to cleverly reduce the amount of space needed for your assessment data. I’m sad to say that your ability and need to collect more data grows much more quickly than you can compensate for with clever algorithms.

I haven’t yet mentioned, for example, SCAP. FDCC compliance assessments can easily generate another 2-4 Mb of data per day, per asset, of compressed data. Compressed data aren’t useful – if you want  to search those data then you have to decompress and parse the results into something that takes more space.

Data loss prevention systems will want to store copies of suspicious data transfers – can you say 700 Mb per cd or 5 Gb per DVD?  Multi-hundred megabyte installers?

I’ll stop there.

This problem is not insoluble.  White listing and continuous monitoring technologies reduce collected data by only reporting important events.  Decision support technologies are available to deal with large data sets, helping decision makers to see the forest and manage risk.

The internet ecosystem is a hairy place. Choosing not to connect to the internet is not a viable business option, so automated assessment, continuous monitoring, and risk management decision support is our future.

Whoops – gotta go! I’ve found a sale on hard drives.



PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.
Member Account Required
You must have a member account on this website in order to post comments. Log in to your account to enable posting.