Xls pwd (Recover passwords from Excel Sheet Protection)
If you protect your Excel Sheet (or, in the same fashion, an Open Document Sheet) with a password from cell modifications, you might be aware that this is just a protection from incidental changes. If the user of this file wants to change it, he can do so easily by removing the "sheet protection" tag from the worksheet XML files inside an archive manager. Detailed manuals for this are abundant on the internet.
Apart from making the file accessible you might also be interested in the password the creator of the file chose. In this note we want to provide code for recovering the actual passwords for the removal of the protection inside Excel (or LibreOffice Calc).
How does the protection work?
There is a clear and concise description of the "algorithm" on http://chicago.sourceforge.net/devel/docs/excel/encrypt.html, written in 2001. The scheme XORs and shifts the ASCII character bytes with the password length and a certain constant leading to a two byte hash which is the stored in the XLS file. Of course, there is such a great loss of information from the initial password that the set of shortest passwords matching a given hash is usually in the billions. It is a matter of a blink to find a suitable password, but this is not really a security issue since the file can be made available even easier anyway. The question is how Microsoft protects the intended password of the creator of the file from being identified. The creator, being unaware of the weakness of the algorithm, might have reused the same password elsewhere so there is a privacy matter. By the way, if you protect a sheet in an ODS file, a proper hash function is used and the information loss as well as the recovery speed are strongly reduced and a brute force attack seems is not viable.
How fast can we recover passwords?
It seems that Microsoft took the point of view that hiding a tree in a wood might be the best way to do that. Since there are billions of working passwords how can you recover them and identify the intended one? Well, the recovery problem is easy to solve. The operations of the algorithm, XOR and left shift, are very fast bitwise operations, and so are the reversals XOR and right shift. So you can recover at a rate of passwords per second which is in the percent order of the CPU frequency. The most limiting factor is writing the results to disk. In practice, an old laptop can recover and write to disk half a million passwords per second. So you are done with a full set after about an hour. Micrsoft probably chose such fast operations twenty years ago in order not to slow down the user experience any further when un-/protecting a sheet. The second problem, how to identify the intended one, is more of a psychological problem. But running a dictionary check on the passwords file might be a good idea.
So why did MS never change the algorithm?
Don't ask me.