Clear As Mud

Published Mon, Sep 14 2009 9:49 | Bob Phillips

Normal 0 false false false MicrosoftInternetExplorer4

A question on one of the forums recently asked about hiding sheets in a workbook to be posted as an example, but where that workbook contained sensitive information. As I replied, hiding the sheets doesn’t really hide the data so he would be just as exposed.

Jimmy Pena has a a great application for generating random data, but sometimes it is just better to scramble the data that you already have. I suggested to the poster that they should really be scrambling the data, and further suggested that it should not be too difficult to write some code to do so.

That got me thinking as to how to do it, in a generic manner. It should cater for names, ids, amounts, currencies and so on.

A Simple Scrarmbling App

My first thought was to insert a row at the top of the target sheet, and add an entry for all columns to be obfuscated. So, for instance, if it had 'Name' at the top, each entry in that column would be changed to Name #1, Name #2 etc. If it had Number<7>, you would generate a random number up to 9,999,999. If it had Decimal<5,2> you would generate numbers up 99,999.99, and so on. If the value in row 1 was blank, that column gets ignored.

So you would insert a row with values such as ‘Name’, ‘Number<7>’, and so on.

The code for this was relatively simple to write, just a loop through each column and change the values. To ensure that all like values get changed to the same value, I did a Replace on the whole column once a value to be changed was found. In other words, if Bob appeared 100 times in the column, each instance of Bob would be changed to the same value. To ensure that I didn’t then go and change the second, changed, instance of Bob to some new value, I pre-pended the new value with ‘Obfuscate’, which I stripped off with a Replace at the end.

The main loop was like this

            mpDataType = .Cells(1, j).Value

            mpNextItem = 0

       

            If mpDataType <> "" Then

                mpLastRow = .Cells(.Rows.Count, j).End(xlUp).Row

 

                For i = 2 To mpLastRow


                    If Not .Cells(i, j).Value Like "Obfuscated*" Then

                   

                        mpNextItem = mpNextItem + 1

                        Select Case True

                       

                            Case mpDataType Like "Number<*"

                                'number processing

                           

                            Case mpDataType Like "Decimal<*"

                                'decimal processing

                       

                            Case mpDataType Like "Currency<*"

                               'currency processing

                               

                            Case Else                                

                                .Columns(j).Replace What:=.Cells(i, j).Value, _

                                                    Replacement:="Obfuscated" & mpDataType & " #" & mpNextItem, _

                                                    LookAt:=xlWhole, _

                                                    SearchOrder:=xlByRows

                        End Select

                    End If

                Next i

 

            .UsedRange.Replace What:="Obfuscated", _

                           Replacement:="", _

                           LookAt:=xlPart, _

                           SearchOrder:=xlByRows

 

So far, so good, but I could see one major problem. If there is a formula that refers to data somewhere else on the spreadsheet, that table needs to be obfuscated too, but in a smart way. For instance, say that there is a lookup table of names Bob, Simon and Alex. If there is a formula somewhere of =VLOOKUP(A20,lookup_table,2,FALSE), and A20 is one of those values in the table, then that value in the lookup table should change to the same value that A20 switches too. Unfortunately, it isn't only VLOOKUP, it is LOOKUP, HLOOKUP, COUNTIF, and so on. Tough!

It is actually worse if the values are used elsewhere a simple cell reference or by a code event update, there is no way in my code to recognise that.

Obfuscator

In the end, I decided to avoid this route, far too difficult, and I opted to save all the before values, and all of after values in separate arrays, and after having removed the ‘Obfuscated’ tag I went through each sheet and checked if any of the before values still remained, if so I replaced them.

A bit brute force, but it seems to work okay.

The data is first changed like so

 

                            Case Else

                                mpIdxChange = mpIdxChange + 1

                                mpBefores(mpIdxChange) = .Cells(i, j).Value

                                mpAfters(mpIdxChange) = mpDataType & " #" & mpNextItem

                                If mpIdxChange Mod 1000 = 0 Then

                                

                                    mpSizeArray = UBound(mpBefores) + 1000

                                    ReDim Preserve mpBefores(1 To mpSizeArray)

                                    ReDim Preserve mpAfters(1 To mpSizeArray)

                                End If

                               

                                .Columns(j).Replace What:=.Cells(i, j).Value, _

                                                    Replacement:="Obfuscated" & mpAfters(mpIdxChange), _

                                                    LookAt:=xlWhole, _

                                                    SearchOrder:=xlByRows

 And finally ‘corrected’ like so

 

        'now make the changes for names, ids, etc.

        ReDim Preserve mpBefores(1 To mpIdxChange)

        ReDim Preserve mpAfters(1 To mpIdxChange)

       

        'first remove the tag

        .UsedRange.Replace What:="Obfuscated", _

                       Replacement:="", _

                       LookAt:=xlPart, _

                       SearchOrder:=xlByRows

                      

        For Each mpWS In ActiveWorkbook.Worksheets

        

            'then update any associated values

            For i = 1 To mpIdxChange

           

                mpWS.UsedRange.Replace What:=mpBefores(i), _

                                       Replacement:=mpAfters(i), _

                                       LookAt:=xlWhole, _

                                       SearchOrder:=xlByColumns

            Next i

        Next mpWS

       

        .Rows(1).Delete

Of course, it can be taken a lot further, adding further sophistication. For example, it doesn’t explicitly cater for an Excel spreadsheet that is constructed as a pseudo-database, separate, linked tables. And it probably needs a decent progress bar  as it could take some time on large data.

An addin with the code and an example file can be downloaded via the RSS feed. The addin adds an item to the Data menu.

 

Leave a Comment

Name:  
Website:
If you can't read this number refresh your screen
Enter the numbers above:  

Search

Recent Posts

Blog Care

Reading Matter

This

Archives

Syndication

Email Notifications

Community

Tags