|Cumulative Sum Reverse Test|
with Rapid Miner:
time = 0.000022695548361574 + 0.000000011666714812 * length, Coefficient = 0.992138566158351800
The Coefficient value indicates how well the model fits the data, it's close to 1.0, indicating that we have accounted for almost all of the variability of time (0.992138566158351800) with bit stream length variable.
time = 0,0000226955483615945 + 0,000000011666714812 * length, Multiple R = 0,992138566158346
Which confirms RapidMiner results with a negligible variation.
With these results we'll be able to establish random number test algorithm relations with cipher, decipher algorithms and hash functions (all components of DiceLock).
We compute "time" in seconds depending of the length of the stream being checked for randomness.
Computing "time" comprises the stages:
In order to establish the relation between stream length in bits with execution time in seconds we have performed the following execution:- 32.219 streams with lengths from 1.024 bits to 1.000.000 bits, each stream 32 bits longer than the previous one,
- computed random test execution time for each stream with CounterTime class,
- saving to "CumulativeSumReverseTest.csv" file the information: stream length in bits, random test time in seconds for all 32.219 tests performed,
- finally, compute random number test with NIST FIPS 800-22 rev1a test file "data.e" with alpha set to 0.01. We can see that expected pvalue and computed pvalue match.
We can see the last plain streams computed, the length of each bit stream and the time in seconds it took.
At the end alpha is set to 0.01 and NIST "Data.e" is being tested and p-value evaluated.
With CumulativeSumReverseTest.csv we can analyze if it's possible to establish some strong relationship between stream length and execution time in seconds. We've worked with RapidMiner software to see if we can extract some conclusion.
The next step performed has been to plot just "time" (Y axis) against stream "length" (X axis). Although the majority of points are over a line, some points are outliers (they are above the line). We can think about these outliers as system interruptions.
It's obvious that execution time has a clear relationship with stream length and that linear regression can be obtained.
With this information we are able to establish a relationship between "time" and stream "length". A linear regression equation can be established to get a final relationship between random number test and cipher algorithms.
RapidMiner software allows to get models on data and establish relationaships between them.
We have taken a two ways approach, establish the linear regression with RapidMiner and verify such linear regression with Microsoft Excel.
RapidMiner linear regression analysis
With the data regarding "time" and stream "length" we have perfomed the linear regression of "time" depending on stream "length".
The result obtained is the following one:
intercept = 0.000022695548361574,
and the slope = 0.000000011666714812,
with a Standard Coefficient of:
The Coefficient value indicates how well the model fits the data, it's close to 1.0, indicating that we have accounted for almost all of the variability of time (0.992138566158351800) with stream length variable.
Excel linear regression analysis
Microsoft Excel provides for data analysis. We perform the same Linear Regression of the same CumulativeSumReverseTest.csv file in order to check that we get the same results.
In Excel we analyze all 32.219 "time" and stream "length" data samples with Regression and we get the linear regression and ANOVA:
intercept = 0,0000226955483615945, (Excel result)
intercept = 0.000022695548361574, (RapidMiner result, negligible difference)
slope = 0,000000011666714812, (Excel result)
slope = 0.000000011666714812 (equals RapidMiner result)
And the important statistical information
The Coefficient is close to 1.0, indicating that we have accounted for almost all of the variability of time (0,992138566158346) with stream length variable.
coefficient = 0,992138566158346, (Excel result)
coefficient = 0.992138566158351800, (RapidMiner result, negligible difference)
The fit plot of calculated "estimated values" with the linear regression formula (pink line) and the sample data (blue points) shows us that the linear regression formula adjust quite well to real values.
The linear regression formula can be used to estimate time relationships between Cumulative Sum Reverse random number test with cipher algorithms.