Chris Hoofnagle, of the Berkeley Center for Law And Technology just published a fascinating report entitled "Measuring Identity Theft at Top Banks." If you have not already, and you are at all interested in security and privacy, you owe it to yourself to read the report. It analyzes identity theft reported to the Federal Trade Commission to start developing an understanding about which institutions have more of it.
Chris is very clear that this is a first version of the report and that it needs to be extended and expanded, and even lists a number of weaknesses of the current methodology in the report. However, it strikes me that one of the unfortunate side-effects of this type of analysis is that people may read it as an indictment of some of the victims of identity theft: the organizations who are targeted. Granted, many organizations are clearly not doing enough to help their customers avoid identity theft. Some, such as TJX and the U.K. Government, have shown a completely reckless disregard for their customers privacy, apparently without any significant consequences. Yet, many organizations are doing interesting things to combat the problem. Without truly understanding what it was that caused Bank of America to show up as the institution with the largest incidence of identity theft I do not think we should rush to indict them as an unsafe institution to do business with.
To that end, and in the hope that both Chris Hoofnagle, and others who extend his work, do so in ways that assist our understanding of this serious crime, I composed a commentary about the report. I already sent it to Chris, but thought it might be interesting reading to others as well. Chris responded to my commentary, and his responses, where relevant, are also included below.
1. The source of the information used by the criminals may be entirely unrelated to the institution the consumer reported as being involved in the crime. For example, if you look only at the phishing subset of identity fraud, as much as 75% of it is targeted at eBay (http://www.sophos.com/pressoffice/news/articles/2006/07/top-phishing-targets.html). However, eBay shows up relatively low in the report. This could be for a number of reasons:
a. The institution where the information came from may not be the institution where it was used. This may, in fact, explain the occurrence of identity fraud at telecom companies. It is not too difficult to open a new wireless account and using the information gleaned from account takeover at eBay probably gives you enough information to do so. I have seen proprietary, largely anecdotal, evidence that many account compromises are not actually used on the site where the account was stolen, but somewhere else that provides more value.
b. The FTC does not get involved in crimes involving eBay to the same extent that they do in crimes involving financial institutions. Much of the crime is about monetizing information these days, and doing so is far easier on Bank of America than on Pay Pal, far easier on Pay Pal than on eBay, and far easier on eBay than on other online properties.
c. The crimes are almost exclusively targeted at the end-user. End users of certain institutions are probably far more likely to be victimized by less than perfect attacks on their identity because of the type of customer the institution targets. For example, Capital One targets primarily the low-income, less educated, and less credit worthy credit card customer. It stands to reason that they would be more likely to fall for fraud than an HSBC customer, who is likely more sophisticated. HSBC, at least in the U.S. also would have far fewer customers than Capital One, skewing the results. In short, without taking into account predisposing factors such as the education level of the customers, the number of customers, and so on, the result seems more flawed than the study acknowledges.
Chris responds that "...banks are underinvesting and downplaying their true losses from identity theft. Blame is a difficult issue here--yes, the impostor is to blame, but there are situations in law where one becomes responsible for the criminal actions of third parties. Landlords, for instance, can be liable to tenants for certain criminal actions of third parties. It's in this spirit that banks share some blame in these crimes." I would add that, yes, many banks are, and they are proving far more interested in complying with voluntary regulations such as the FFIEC guidelines than they are in truly helping their customers protect themselves. That much is obvious from the implementation of completely ineffective authentication systems, such as measurement of typing cadence. However, some organizations are doing the right thing, and have recognized that protecting their customers is key to their survival as a business. On the whole though, maybe the banks' rush to comply with even voluntary standards, like FFIEC, is indicative of the power of regulation and should be harnessed?
2. The reports mentions that the data is a step toward giving the consumer information to vote with their feet and choose “safer institutions.” However, what constitutes a safer institution? Certainly, an institution with a lower incident of identity theft by deposits is not necessarily any safer, because that data is skewed in favor of institutions with a few very large accounts. Likewise, an institution with a lower overall count of identity theft is also not necessarily safer. The fact that Third First National Bank of The Side Street off Main in SomeTown, Idaho had no incidents of identity theft could simply be a reflection of the fact that they have less than 6 accounts, not that their strategy to use 4-digit pin codes on their web site was particularly effective. The “safer institutions” are the ones that provide their customers with the information they need to protect themselves, that include information on how to authenticate a web site to the customer, and which take a lead in customer education and fraud combat. Bank of America’s site key system is often cited as a model in that space. Discover Card’s refusal to present customers with even an SSL certificate prior to logon sits at the other end of the spectrum. The present study, unfortunately, seems to indicate that just because an institution has less fraud, in absolute terms, makes it safer.
3. On page 2 the report mentions that institutions should report the number of identity theft events avoided. How exactly could that be measured? Is that not like proving a negative? Certainly, an institution can cite numbers on how many incidents of attempts at opening fraudulent accounts its customer service representatives caught, but that hardly captures the full picture. I can prove I did not get hacked this week, but my “proof” may only prove that my detection mechanisms are flawed.
4. Another explanation for fraud at telecom companies may be stolen devices. Without an understanding of the nature of the fraud it is impossible to say what the source is, and to pass any judgment on the organizations acumen in helping its customers. The data appears to have no indication at all on what the source of the fraud is.
5. Which brings me to my point about suggested further study: why. Why is it that some institutions have a far greater incidence of identity theft than others? At this point, I think we need some hypotheses about the contributing factors, including customer demographics, number of customers, size of the accounts, the ease with which account takeover can be monetized, the protective measures in place at the institutions, the type of advice given to customers, and so on. This requires far more data gathering, and some multivariate analysis of the impact of each variable on the number of accounts stolen.
6. Are the months covered by the report (by necessity obviously) actually representative of the year 2006? Certainly, the data is very interesting, and this report is the first of its kind. However, future studies, I believe, must look at larger, more representative, data sets. Looking, again, at the subset of fraud presented by phishing attacks, I am not at all convinced that the months in this report are representative. According to the Anti-Phishing Working Group’s report for December 2006 (http://www.antiphishing.org/reports/apwg_report_december_2006.pdf) January and March were some of the calmest months for phishing in 2006, and September had the lowest figure of the latter half of 2006. Of course, much of the fraud reported in September may have been based on data stolen in prior months, but the fact still remains that the activity differs by month. In fact reports for January and March were both about one standard deviation below the annual average. Reports for September, while roughly at the annual average, were almost one and a half standard deviations below the average for the second half of the year. Compared to the average for the second half January and March reports were well over three standard deviations below the average. Thus, I do not think it is reasonable to say that January, March, and September were representative months since it is clear that the number of reports trended significantly upward for the year. Obviously, the current report advances our understanding far more than not having any analysis at all, and a larger analysis would have taken for longer. I would just like to see a more representative sample in the next report.
Chris responded that the months were chosen totally randomly, and that the seasonality of the crime makes that a weakness. However, obtaining an entire year's worth of records takes a year.
7. The report merges data for institutions such as “Citibank Visa and “Citybank” into one canonical representation. Is that actually accurate though? For example, did Citibank National Association use different protective measures than Citibank (South Dakota) National Association? If they did, the merge is not warranted. In fact, if a single institution has different ways to access different types of accounts, then I think each type of account needs to be considered separately.
8. You mentioned that getting data on wireless subscribers is not possible. I disagree. It is possible to get some form of data, although it is obviously not entirely accurate. In a couple of internet searches I managed to find several sources of such data. For example, AT&T reports having 70.1 million subscribers (http://www.att.com/gen/general?pid=7461). T-Mobile USA reported having 25M by the end of 2006 (http://www.unstrung.com/document.asp?doc_id=118633&page_number=1&table_number=2). HTC actually reports numbers for all the major carriers at http://www.htcamerica.net/products/products-carrier.html. They may not be completely accurate, but as a first-order approximation I think they should do nicely.
Chris responds to this that he considers any number untrustworthy unless it is filed in a document with the goverment. It is hard to disagree with that position, but I personally would have been inclined to make do with potentially flawed numbers if accurate data is impossible to come by. I will consider that merely a disagreement merely on scientific philosophy.
9. On page 7 the report, again, makes the claim that “A more complete picture of identity theft will not emerge until institutions provide more transparency on the problem.” While I applaud the effort to get transparency into the problem, this is fraught with problems in several ways. First, the institution, while it is an incidental victim, is not the true victim, and not the true target. The end-user is. The institution may not always know that it was involved, especially not if the account is stolen from one institution but used at another one. Data on the institutions, like you have in the present study, may indicate that it is easier to monetize stolen information in some places than in others, but says nothing about the protective measures those institutions are using to protect the information they themselves own
On the whole, I find the report fascinating, and an important first step in furthering our understanding of identity theft. I thank Chris for doing this. Now we need to keep building on it and develop a real understanding of the causes of identity theft and how effective the mitigators are.

