How the Bible and YouTube are fueling the next frontier of password cracking

Crackers tap new sources to uncover “givemelibertyorgivemedeath” and other phrases.

Aurich Lawson

Early last year, password security researcher Kevin Young was hitting a brick wall. Over the previous few weeks, he made steady progress decoding cryptographically protected password data leaked from the then-recent hack of intelligence firm Stratfor. But with about 60 percent of the more than 860,000 password hashes cracked, his attempts to decipher the remaining 40 percent were failing.

The so-called dictionary attacks he mounted using lists of more than 20 million passwords culled from previous website hacks had worked well. Augmented with programming rules that substituted letters for numbers or combined two or more words in his lists, his attacks revealed Stratfor passwords such as “pinkyandthebrain,” “pithecanthropus,” and “moonlightshadow.” Brute-force techniques trying every possible combination of letters, numbers, and special characters had also succeeded at cracking all passwords of eight or fewer characters. So the remaining 344,000 passwords, Young concluded, must be longer words or phrases few crackers had seen before.

“I was starting to run out of word lists,” he recalled. “I was at a loss for words—literally.”

He cracked the first 60 percent of the list using the freely available Hashcat and John the Ripper password-cracking programs, which ran the guesses through the same MD5 algorithm Stratfor and many other sites used to generate the one-way hashes. When the output of a guessed word matched one of the leaked Stratfor hashes, Young would have successfully cracked another password. (Security professionals call the technique an “offline” attack because guesses are never entered directly into a webpage.) Now, with his arsenal of dictionaries exhausted and the exponential increase in the time it would take to brute force passwords greater than eight characters, Young was at a dead end. In the passwords arms race, he was losing. Young knew he needed to compile new lists of words he never tried before. The question was where to find the words.

Enlarge / After cracking 60 percent of passwords leaked in the hack of Stratfor, Kevin Young mined the Internet for longer passphrases.
Kevin Young

A free cracking dictionary anyone can compile

Young joined forces with fellow security researcher Josh Dustin, and the cracking duo quickly settled on trying longer strings of words found online. They started small. They took a single article from USA Today, isolated select phrases, and inputted them into their password crackers. Within a few weeks, they expanded their sources to include the entire contents of Wikipedia and the first 15,000 works of Project Gutenberg, which bills itself as the largest single collection of free electronic books. Almost immediately, hashes from Stratfor and other leaks that remained uncracked for months fell. One such password was “crotalus atrox.” That’s the scientific name for the western diamondback rattlesnake, and it ended up in their word list courtesy of this Wikipedia article. The success was something of an epiphany for Young and Dustin.

“Rather than try a brute force that makes sense to a computer but not to people, let’s use human beings because people typically make these long passwords based on things that humans use,” Dustin remembered thinking. “I basically utilized the person who wrote the article on Wikipedia to put words together for us.”

A crotalus atrox, aka western diamondback rattlesnake.

Almost immediately, a flood of once-stubborn passwords revealed themselves. They included: “Am i ever gonna see your face again?” (36 characters), “in the beginning was the word” (29 characters), “from genesis to revelations” (26), “I cant remember anything” (24), “thereisnofatebutwhatwemake” (26), “givemelibertyorgivemedeath” (26), and “eastofthesunwestofthemoon” (25).

An arms race as old as civilization

The experience underscores the rapidly unfolding arms race between everyday people trying to secure their digital assets and the whitehat and blackhat hackers trying to compromise them. The race is almost certainly as old as the password, which itself dates back to as early as ancient Rome, when military leaders developed a careful procedure for circulating daily watchwords to prevent infiltration by enemy soldiers. As Ars reported last year in a feature titled “Why passwords have never been weaker—and crackers have never been stronger,” the seminal moment in the modern password race came in late 2009, with the compromise of gaming website RockYou. It spilled more than 14 million unique user passwords in plain-text, exposing a then-unprecedented corpus of real-world credentials that would forever change the way passwords were cracked.


When combined with cracking techniques that augment attack dictionaries—for instance, by substituting the letter i with the number 1 or appending letters or numbers to the beginning or end of base words—crackers have been able to vastly expand the collective corpus of real-world passwords with every subsequent breach. Some crackers today wield word lists with close to one billion entries. And of course, the updated dictionaries are able to yield many new passwords still by passing the lists through the same set of programming rules. One such rule, known as a “combinator” attack, runs two or more words together and either strips out all the spaces or leaves them intact. Other “mangling” and “hybrid” rules account for variations in capitalization, character substitutions, and other tweaks. As a result, cracking programs not only try the word “house” as included in a cracking dictionary, but also “House,” “housE,” “hou$e,” and “house1997.” With each successful match, crackers gain increasing insight into the words people pick to secure their digital assets. In that way, the collective corpus of passwords grows larger each day.

Enter the “pointless” passphrase

As awareness has grown about the growing insecurity of passwords that were presumed strong only a few years ago, many people have turned to passphrases, often pulled from what they believe are overlooked songs, books, or other sources. The idea is to generate a long passcode that contains upper- and lower-case letters and possibly punctuation that’s nonetheless easy to remember. This turns out to be largely an exercise in futility. As is the case with passwords, the same thing that makes passphrases easy to remember makes them susceptible to easy cracking.

“I see a lot more users choosing passphrases today than three years ago,” said Yiannis Chrysanthou, a security researcher who recently completed a master’s thesis on modern password cracking techniques at Royal Holloway, University of London. “This is both encouraging and pointless.” He continued:

It is encouraging that users try to make their passwords more secure by using passphrases as this shows that users are now more security aware. Passphrases are also pointless in my opinion. Just like passwords, passphrases need to be easy to memorize. They need to make some sort of sense to the user. That is what makes them vulnerable. There is an additional x amount of effort required in order to memorize a passphrase, but in my opinion the benefits are far less than the extra effort. Also, users construct passphrases based on “tips and tricks” published as guidelines or even dictated by password policies. I use those guidelines and policies as part of my rule sets and they are extremely useful for cracking passphrases.

For a graphic example of passphrase weakness, consider the string “Ph’nglui mglw’nafh Cthulhu R’lyeh wgah’nagl fhtagn1″ (minus the quotes). With a length of 51 and a 95-character set containing upper- and lowercase letters, numbers, and special characters, its entropy is 284.9 bits. The total number of combinations required to brute-force crack it would be 9551, making such a technique impossible on any sort of computer known to exist today. What’s more, the string isn’t found in any language dictionary. No wonder password strength meters like this one use words such as “overkill” to describe it.

But as Ars recently reported, Chrysanthou had no trouble cracking the SHA1 hash that corresponded to the string for one simple reason. This is a fictional occult phrase from the H. P. Lovecraft short story “The Call of Cthulhu,” and it was contained in this Wikipedia entry.

“In terms of passwords/passphrases, it doesn’t get much better than this,” Chrysanthou wrote in describing the theoretical strength. “Do I think it is stronger than a shorter passphrase consisting of only lowercase letters? Yes. Do I think that it is impossible to crack? Obviously not. How would I improve it? If I really have to use a passphrase I would make my own mixture of words, mutate them [in] my own unique way, add some random characters between them, and hope that someone doesn’t create a rule that can crack it.”

“Ph’nglui mglw’nafh Cthulhu R’lyeh wgah’nagl fhtagn1″ is by no means the only long and obscure phrase Chrysanthou has cracked. Others include:

A Little Piece Of Heaven01
FanBoy And Chum Chum1
Harry Potter and the Deathly Hallows22
I need a new password
Password must be at least 8 characters
youcantguessthis password1980
you will never guess2809
i have no idea what my password is

No, mangling won’t save your passphrase, either

Just as many people tweak their weak passwords in a fruitless attempt to make them harder to crack—think “P@$$word123,” “m27bufford,” “J21.redskin,” and “Garrett1993*” from a previous Ars feature that showed how crackers ransack longer passcodes—passphrase users frequently modify well-known quotes. It’s mostly wasted effort.

Crackers run their list of phrases through many of the same rule sets they use for single words. Within milliseconds, assuming the most commonly used “fast” hashing algorithms are used, a cracker guessing the sentence “The quick brown fox jumps over the lazy dog” can also try “The_quick_brown_fox_jumps_over_the_lazy_dog”, “Thequickbrownfoxjumpsoverthelazydog,” “The qu1ck br0wn foX jumps 0ver th3 l4zy dog,” and literally thousands of other mutations. The same mangling technique can add dates, punctuation marks, and other stray characters to the beginning, middle, or end of phrases. “you dont know my password1981,” “los lokos de arriba92,” “h3r3 b3 dr@g0ns,” and “neighbourofthebeast667″ may give the impression of invincibility, but each one will be deciphered in a matter of hours using standard passphrase lists.

Enlarge / Many programming rules for expanding a list of passwords work equally as well on passphrases.

The cracking process is made easier by the tendency of most phrases to adhere to predictable patterns. The longer a passphrase is, the less likely it is to contain extra characters, variations in case, and other modifications. Young called this “shift key aversion.”

People “hate typing ‘daN Goodin’,” he explained. The longer the word you get, the lower chance that someone does some weird combination of special characters. We also found that people who use ‘dangoodinridesabluebicycle’ tend to omit spaces. “We see a lot of quotes out of the Bible. We see a lot of four-letter obscenities.” Then, recalling one such password he said: “‘yournevergoingtogetmyfuckingpassword.’ Oh yeah? It’s right here.” (Young managed to crack the phrase even though the first word was spelled “your” instead of “you’re.”)

One force pushing the frontier of passphrase cracking is relatively new. oclHashcat-plus, the Hashcat version that can use dozens of graphics cards to simultaneously crack huge numbers of cryptographic hashes in seconds, was recently updated to tackle passphrases as long as 55 characters, breaking a previous 15-character limit. That brings turbo-charged cracking to a whole new length of passcodes. (John the Ripper and slower versions of Hashcat are still able to crack passwords with a length of 56 and above.)

A new way

Encouraged with their results, Young, Dustin, Chrysanthou, and other crackers are tapping an ever larger pool of phrases. News websites, multilingual forums, public IRC logs, Wikipedia, Pastebin, e-books, movie scripts, and song lyrics are just some of the wells they’re drawing from. And of course, Facebook and other social networking sites are goldmines. For example, in May 2012, while cracking 160,000 MD5 hashes leaked from (a dating website for members of the US armed forces), Young and Dustin turned to Twitter to increase their supply of words and phrases used by people in the military.

A screenshot of Dustin’s script that searches Twitter for military- and dating-related words.
Josh Dustin

A script Dustin wrote searched the microblogging service for a dozen or so terms that related to both the military and dating, such as “afghanistan” and “love.” The script then scraped the results and organized them in various ways. Of the 4,400 unique words or phrases they mined from the Twitter searches, 1,976 of them were all or part of actual passwords used by MilitarySingles users.

“On, people used passwords like ‘hooah’,” Dustin explained. “That’s not a word that will be in your dictionary, but by supplying the words ‘marines’ and ‘navy,’ you’re going to end up with words like ‘hooah’ in your list. With Twitter, it lets you target specific password users.”

More recently, Dustin has turned his attention to YouTube for the same reasons.

“I like YouTube comments because you have current garbage, stuff the way people say it, slang, misspellings, and things like that,” he said. “That’s the way people do their passwords quite often. You often find a lot of slang, and a lot of that slang doesn’t end up in a dictionary or even on Wikipedia or in a book.”

Enlarge / Publicly available IRC logs are just one of the sources crackers are tapping to decode long passphrases.

Not everyone has the resources of the NSA

Plucking long word groupings out of books and articles and turning them into working cracking dictionaries is no trivial undertaking. For one thing, it requires huge amounts of disk space. Dustin works around the challenge mostly by filling up his 1TB hard drive with a list, using it to generate guesses against his uncracked hashes, wiping the drive clean, and starting all over with a new list of phrases.

There are also the logistical challenges of lifting huge amounts of text from sites that were never intended to be data mined in such a way. The Gutenberg Project, for example, wasn’t set up to accommodate massive numbers of simultaneous downloads. Even with that hurdle partially scaled, Dustin and Young still faced the problem of deciding how best to organize the data into a cracking list. Eventually, Young compiled a whopping 1.36 billion unique phrases from the first 15,000 books in the collection, leaving the remaining 27,000 e-books for some other time.

The list compiled from Gutenberg contains alphabetically arranged phrases with a minimum of eight characters that can grow sequentially longer. There are slightly more than 5.2 million eight-character strings, which begin with “aboutany,” “abouthow,” and “aboutthe” and end with “willcapt,” “workande,” and “youhabnt” (no, those aren’t typos). The sprawling list tops out with more than 108 million unique phrases that are 25 characters long, starting with “aaaaaaayrestrestperturbed,” “aaaaaafigureapproachesthe,” and “aaaaaahexclaimedthemonkey” and ending with” zzzandthenanexplosionthe,” “zzzzzendofmrhoneysbanking,” and “zzzzzzthewiressnappedwith.”

Downloading the list and storing and organizing it in an effective way has been a learning process, especially since password cracking is a side project that’s independent of Young’s day job as a senior information security engineer at Adobe.

“I live in Utah, and from the break room window I can see the NSA facility,” Young said. Then, referring to his 1TB disk filled with phrases from all over the Internet, he added: “That’s probably nothing compared to what those guys have. I consider myself a pretty average guy as far as cracking goes. But for the general password cracking community, that’s probably pretty good. As opposed to 14 million-word RockYou dump, I’ve got 1.3 billion passphrases.” And that’s only from Project Gutenberg.

There are also a variety of other challenges, not the least of which is the difficulty of mining names and quotes from millions of movies, books, and other works in order to add them to cracking lists. For whitehats, the hurdles also include the legal uncertainty of tapping copyrighted materials.

For his part, Chrysanthou said the biggest challenge is the work required to update and hone his phrase lists and rule sets to ensure that they can be processed quickly on his computer, which uses an Intel Dual Xeon CPU, a single AMD Radeon 5870 video card, and a traditional hard disk.

“The pre-work takes the longest, but that ensures readiness and effectiveness of the actual attack,” he said. “My pre-work is never-ending, and my word lists and rule sets are always a work in progress.” He added:

In terms of the attack itself, it depends on the attack technique. Let’s say that we are cracking phrases using a word list attack with rules. I have some gigabytes of word lists, and if I use only my TOP 100 rules then it will take less than an hour for a reasonably sized hash list. It will take some hours using bigger rules etc. A combined dictionary attack with rules could take days, depending on the size of the dictionaries that will be combined and the number of mutations done by the rule sets. I have a normal workstation, so naturally a bigger cracking rig with solid state hard drives and multiple GPUs could do this much faster.

Kevin Young

Move over, “Oscar+emmy2” and “momof3g8kids”

In a previous feature, Ars explained how crackers were able to ransack passwords such as momof3g8kids,” “m27bufford,” and “Oscar+emmy2″ even though they weren’t found in the billion-plus word lists the crackers used. Typically, the dictionaries contain only core components such as “mom,” “kids,” “bufford,” “oscar,” and “emmy.” Rule sets then append brute-force guesses to the beginning or ending of the root words—something called a hybrid attack. Or the rules combine two or more dictionary words together in what crackers call a combinator attack. Or they meld two or more such techniques at the same time.

What’s different about the passcodes cracked in this article is that their raw materials were assembled from phrases rather than single words. While a computer eventually might have combined the words “crotalus” and “atrox” to guess one of the passwords Dustin decoded, it probably would have taken years of time-consuming combinator attacks before that winning pair came up in the roulette wheel. Relying on texts that are easy to download and plug into cracking programs, by contrast, takes hours. The race to decipher long passphrases is still in its infancy compared with password cracking, but it’s already showing a similar trajectory. As new technologies emerge that make it easier to access more written material, “It was the best of times, it was the worst of times,” “We the People of the United States, in Order to form a more perfect Union,” and any one of millions of other well-known phrases may soon offer no more protection than “Password123″ and “letmein!”

Enlarge / Dustin’s computer can perform 30 billion guesses per second against standard Windows hashes. The $800 system uses four AMD Sapphire Radeon 7950 cards.
Josh Dustin

“If we rely on the fact that humans use words, and humans put words together in a certain way, we can try a whole lot of different combinations and end up getting quite a few,” Dustin said. “Whereas, if we brute force [phrases], we’ll get pretty much nothing.”

Young added: “The same way the GPU has jumped, we’ve jumped the whole traditional word list. So now we’ve got well over a billion phrases from Gutenberg alone. And we’ll just do a Twitter dump tomorrow and get everything that’s changed since then.”

This article was posted originally on Arstechnica.