RSS feed

Episode 5: Computer vision with opencv

Gary Bradski and the Willow Garage PR2

In this episode we introduce opencv, a popular open source computer vision library created by Dr. Gary Bradski and developed in large part by a team of Russian computer vision and optimization experts. We interviewed Gary at his office at Willow Garage, where they are building an open research platform for personal robotics, including the PR2 above.


Download episode

Handy links below.


DARPA Grand Challenge

OpenCV Hacks


Gary: It makes the little line matching canal find all the triangulations to fill in this scene in a dense way. If you have dense, nice 3D models then there’s a lot you can do in perception. But it also works – it’s getting – you can’t see this but this is rapidly flashing on and off and so it gets a clear image and then it gets the dense image and then the other stereo comes in between all that and gets a color image. And then there’s this camera, too, is a high resolution five megapixel, but that only goes …
Nat: That was Gary Bradski, a computer vision expert and senior scientist at Willow Garage, which develops hardware and open source software for personal robotics.
Nat: In 2005, Gary led the computer vision team that won the DARPA Grand Challenge. Gary gave me a tour of the robot they’re building at Willow Garage called the PR2. The robot’s vision system is based on an open sourced computer vision library called OpenCV that Gary created
Alex: And OpenCV is the main focus of our program today.
Nat: I’m Nat Friedman in sunny San Francisco.
Alex: And I’m Alex Graveley, reporting from Boston.
Nat: And this is Hacker Medley, the podcast for curious hackers.
Nat: So Alex, for years and years computer vision has mostly operated in the realm of research or sometimes industrial applications, like in factories to monitor equipment to make sure it doesn’t, you know, spin out of control or in mining, but all of a sudden it’s sort of starting to shop up everywhere in our own lives.
Nat: For example, I’ve got a bunch of apps on my phone that can do things like let me take a picture of a Sudoku board and then recognize the board and all the numbers and show me the solution to the whole board or one that does the same kind of thing for Rubik’s cubes.
Alex: Yeah, and in fact, this stuff is sort of creeping in all over the place. Most cameras these days have face detection auto focus and for instance, Google’s releasing a new product called Google Goggles which does an image-based search.
Nat: Yeah, so you take a picture of like a book or a product and it’ll show you information about, you know, reviews of that book, how much it costs online and this is sort of called image search or reverse image search and there’s a website called that’s been doing the same thing for quite a long time.
Alex: So Nat, what is computer vision?
Nat: Okay, so good idea. Let’s define it before we get into it too much. Gary Bradski, who we interviewed for this podcast, he wrote a book about his library, OpenCV, and at the beginning of the book he gives a pretty good definition of computer vision so I’m just going to quote it here. He says:
Nat: “Computer vision is the transformation of data from a still or video camera into either a decision or a new representation.” And then he goes on to say, “a decision might be something like there’s a person in this scene or there are 17 tumor cells on this slide”.
Alex: And a new representation might be something like Google’s Street View where they stitch a bunch of photos together taken from cars driving all over the city that you’re searching and project them onto a spherical surface so you can pan around and see where the store that you’re trying to find is.
Nat: So, of course, you know, why is this happening, why is computer vision becoming more widespread? Well, of course, one of the big reasons is the widespread availability of CCDs, right? So we’ve got these cameras in our cell phones or other portable devices, most computers come with webcams these days and the algorithms have developed, too. You know, even in the last 10, 15 years computer vision algorithms have improved quite a lot, but one of the other things that’s driven the adoption of computer vision technology is the availability of open source building blocks and OpenCV is a great example of this.
Nat: Gary started the OpenCV library in 1999 when he was working at Intel Research and he spent some time after joining Intel touring different universities around the US and visiting their different computer vision labs.
Gary: And I saw that MIT Media Lab was – had the advantage that they were building on the infrastructure that had built up from other students, so when a student came into the media lab they immediately had available all the image processing, all the scan computer vision routines, and so the students were able to do much more ambitious research.
Nat: So when Gary got to Intel it made sense for him to help the computer vision community by building a common platform that could be used to do computer vision research and to advance the state of the art in computer vision without having to build everything from scratch, but also it made sense for Intel because they would be able to encourage people to use a lot more processing power.
Nat: So Gary had a bunch of image routines that he’d written himself, but he knew he needed more help and as it turned out Intel had just the team for the job.
Gary: So Intel, at the time, was contracting with people in Sarov, Russia, which is their secret city the Soviet Union never acknowledged that exists. It was erased literally on the map. But Intel had hired contractors there who were, some of them, former nuclear weapons designers that they wanted to keep busy doing something else, such as debugging software. And so they became the core and these were really well trained people, they became the core of this library development.
Alex: And Gary chose for his library, the incredibly loose BSD license so that it could be used in the most places possible. And so it’s used all over the place but you can’t really track it because no one has to report what they’re – if they’re using it or not.
Nat: So that was the goal of OpenCV, provide the base level computer vision functions as, you know, as deep and as broad as possible and make them really widely available. And over the last 10 years the library has grown and grown and now there’s tons of functionality. If you get Gary’s recent OpenCV book from O’Reilly, it’s about 500 pages long of great explanations of all the different functions that are in OpenCV. Maybe we should try to go through those. It’s a lot of stuff though.
Alex: Open CV can be split into a bunch of categories of functions built on top of each other and interrelated. The first category is basic image processing routines.
Nat: Gaussian smoothing, bilateral smoothing, morphological dilation, morphological erosion, morphological top hat, black hat, floodfill, pyramid segmentation, Canny edge detection, sparse perspective, affine transformation
Alex: The second category is higher level vision and video processing.
Nat: Contour finding, polygon approximation, convexity defect detection, background subtraction, scene modeling, stereo 3D vision, the watershed algorithm, corner finding, the Lucas-Kanade method for optical flow, the Horn-Shunck method, the Kalman filter
Alex: And the third category is machine learning algorithms.
Nat: Mahalanobis distance, K-nearest neighbor, Bayesian classifiers, binary decision trees, boosting, random trees, Haar classifiers, multilayer perceptrons, support vector machines, expectation maximization.
Alex: So there’s quite a lot of stuff in this library. You can do anything from really basic image transformations to – I mean there’s one entry point that identifies faces in an image. And there’s also some useful platform – 3D platform routines for drawing windows or grabbing frames off of the video camera that you have built into your computer.
Nat: Yeah, that’s pretty cool actually if you’re using OpenCV on Linux or Windows machine it’s basically like two lines of code to open the camera and just grab frames off of it in C or Python or C++.
Alex: So why doesn’t it work that way in ‘06?
Nat: I don’t know. Good question.
Alex: So Nat, what other users of OpenCV are out there?
Nat: Well when I asked Gary, who is using OpenCV the first thing he said is that because it’s a BSD licensed library they can’t know, right? People don’t have to release their code, they don’t have to report that they’re using it, but they suspect, for example, that the face detection algorithm that’s in OpenCV is the foundation of the face detection algorithm that Omron licenses to pretty much all the camera manufacturers worldwide. And then, you know, there’s a whole bunch of other big names. It’s used in space, it’s used by all sorts of companies developing anything related to computer vision.
Nat: One of the applications that Gary told me about that I thought was really interesting is this European product, it’s a drowning detection system that you install at a public pool, and it’s got a couple of cameras above the pool and inside the pool, on the walls of the pool, and it detects when someone’s drowning. And I asked Gary how it works.
Gary: So of course they have to be careful about, you don’t want a lot of false alarms, what you call false positives, and yet you don’t want to have a misdetection. So you don’t want someone to drown and say oh we missed it. Right? So they have a series of checks and they’re looking for blobs that will sit on, that are lying on the bottom of the swimming pool. So I believe, I don’t know the details of their algorithms, but I believe what they’re doing is a kind of background check of the, they learn what the swimming pool looks like with no people in it and then they’re looking for things that lie motionless on the bottom for a certain amount of time. And they also have more than one camera, so they’re looking for confirms at different levels.
Gary: And so when they detect something’s on the bottom it’s not moving and some cameras confirm this then they set off an alarm. On their site, I can’t remember the name of this company, I mean it’s used in the US, too, they have a, probably Google for drowning detection system. It’s a European company. But they have some film strips that actually show this rescuing people. Some guy had a heart attack, he sunk under the water, the lifeguard pulled him up and they saved him, not only from the heart attack but from the drowning. And a couple other known savings of people. So that’s all good.
Alex: So Open CV was even used in winning the DARPA Grand Challenge.
Nat: Yeah, the year that Gary participated in the DARPA Grand Challenge in 2005, when they won actually, OpenCV was the foundation of the vision system they built, but actually why don’t we tell people what the DARPA grand challenge is in case they don’t know.
Alex: Yeah, so DARPA, which is the research arm of the US Defense Department, is the, you know, the guys that sponsored the development of the internet, sponsored a contest to autonomously drive a car 140 miles through the dessert and whoever could – whatever team could build an autonomous car that would be able to do that in the shortest amount of time would win. And I don’t know how much the prize was, maybeÖ
Nat: It was $2 million.
Alex: Oh, it was $2 million.
Nat: Yeah. And Gary stressed for me that he did not get any of that money so it apparently went into Stanford’s endowment. But in 2005 he did lead the team that built the vision system for Stanford’s entry, which was called “Stanley”, and you know, the goal, of course, was to drive, you know, over this sort of twisty path, these roads that went through the dessert in Nevada and they were mountain roads that were quite twisty and they were flat roads that were quite straight and so what Gary’s challenge was was to identify where is the road. I asked him to explain how is it that you go about finding the road.
Gary: Okay. So there’s a lot of ways and we’ve looked at many ways. One of which is dirt roads. They tend to have parallel lines that converge into the horizon and so we did a bunch of convolutions with detectors and that worked quite well. We had detected roads to like 96% or something, which is totally useless for robots because you have 4% misdetection you’re in a ditch or off the cliff.
Gary: So what happened in the conference we had to scale back to use very simple techniques and what happened is we had short ranged laser, five of them, that would be able to tell what the road looked like close by or it would identify that’s flat terrain close by, I like that. And the vision system, by calibrating these together, the laser could tell the vision system, that patch is what I like.
Gary: We use simple color segmentation techniques, which was fitting Gaussian models. These all exist now in the background subtraction routines in OpenCV but fitting a Gaussian color model to this and we simply then had a distribution of what road looked like in color space and we segmented the rest of the scene by seeing which pixels hit that distribution. We had a bunch of other checks that this had to be a contiguous region, it had to be connected.
Gary: We actually, at one point, had trapezoidal checks that we didn’t need in the end. So there were a couple of checks and it worked very fast and reliably.
Gary: And so, you know, basically you have 100% recognition rate to be able to run. The vision’s goal was to detect when the car could go faster. When we’re in the mountains on curvy roads we could go faster than 25 mph and so vision was on but it wasn’t used because we would never see enough road to tell the robot you can run now, but on the straightaway’s where you could go fast vision was crucial for telling the car look this is faster than you can stop in your laser range but I’m sure I see a clear path, floor it.
Gary: There’s a lot of other ways of doing road detection now. We have this watershed transform so you can use that for segmenting roads. I show an example of that as an exercise in the book. We now have a new algorithm that’s written after the book called GrabCut that could easily, if you give me some points in the road and some points out I can easily segment the road. Very nice, much better segmentations than we were getting. So there’s a lot of routines now.
Nat: So I mean, what’s really impressive to me about this, Alex, is that in the last five years even, you know, there’s been huge advancements in the ability to do stuff like segment the road and identify where the road is. It seems like we’re kind of in a golden era for computer vision where the state of the art is really making huge strides over a pretty short period of time.
Nat: And I think one of the things that’s most interesting that Alex and I discovered while we were reading about Open CV is that if you go to YouTube and search for Open CV, we just did this the other day, you’ll find dozens of videos and these are basically hackers who have done these little demos or written these little tools based on Open CV. It’s like this incredibly vibrant hacker scene, like a demo scene of people doing hacks in Open CV.
Nat: So, for example, one of them we found is this guy from Indonesia who did this hack with his web cam based on Open CV to recognize sign language. So you do signs in front of your web cam and it tells you what letters you’re signing, all based on Open CV. It’s pretty amazing.
Alex: Yeah, and that’s really basic because he’s just looking at – he’s just identifying letters and not doing gestures yet, but you know, it’s a start and it shows sort of the power of having a library that lets you do really complex operations. It’s something to build from.
Nat: Yeah, actually this is really cool. He didn’t publish his codes and not all these guys are publishing their code, but apparently he wrote this ASL finger spelling recognizer in C Sharp using a C Sharp wrapper and he kind of, he wrote a little recipe for how it works. He said step one, I did hard object detection to detect whether the hand is open or closed and to determine the position of the box, step two, movement detection, if things are shaking then reset the region of interest box, step three, in order to extract the hand shape I used skin detection based on HSV or RGB, depending on the lighting, and step four, to classify the image I use [Canorous?] neighbor, which is in the machine learning section of Open CV, 100 training images per sign and he says he’s got, out of the 26 letter alphabet, 19 signs identifying perfectly and he’s still working on seven others, you know, the seven remaining letters.
Alex: That’s cool because that’s – I mean you’re talking about the use of all those different categories of algorithms that are present in Open CV to solve, you know, what on the outside seems like a pretty straightforward problem, like you’re just looking for hand figures that look like what you know a sign language A or C to look like.
Nat: You’ve been messing around with OpenCV, haven’t you Alex?
Alex: In my case I was playing with the face detection and eye detection utilities that exist, and I noticed that sort of once you try and dig down into – once you try and do something that is not what a given algorithm was designed for that you end up, I mean you get into a lot of detail on a lot of, you know, techniques in order to handle what you want.
Alex: For instance, in eye detection it’s, you know, it’s difficult to detect eyes if the person is not looking straight at the camera, if their head is rotated, if their hair is in the way, all these kind of things that once you actually try and do it generally makes life really difficult.
Nat: Yeah, I think like part of the challenge is that our intuition for what should be easy and straightforward in computer vision doesn’t match what actually is easy, you know. Like find the eyes or whatever seems pretty straightforward but it turns out to be a little bit challenging and, you know, one of the things I like about OpenCV though is that in a bunch of different languages, you know, in C Sharp, Python, C, C++, there’s a bunch of sample code and these samples are perfect. I mean they’re like 50 lines to 200 lines of code that do just one thing usually. They’re simple. You can feed them your own images and you can just grab the code and kind of start changing things, you know, find the right starting point example and use that to build whatever it is you’re trying to do.
Nat: And like we said, you know, if you go online you’ll find people who do, I don’t know, someone wrote like a tennis ball tracker so he’s bouncing a tennis ball over the room and OpenCV is recognizing it and tracking it really quickly, and people build security camera motion detectors and there’s a whole lot of examples online of great hacks you can do with this thing.
Alex: Yeah, I like the one that I saw where you can show the camera – you can hold the number of fingers up to the camera and then the camera will count the fingers that you’re showing it then add them all up and show you the result.
Nat: Oh yeah, that’s a good one. One of major future areas that they’re investing in in OpenCV is stereo vision and that’s because Willow Garage is sponsoring a lot of the development of OpenCV and, you know, their focus is to build a personal robot that uses some different stereo vision techniques and, you know, laser scanners and other techniques like that to try and model a 3D world and interact with it. And so that’s, you know, doing real time stereo object detection recognition is a big part of the future direction of OpenCV. But you can also, you know, take it in any direction you want. It’s pretty powerful.
Alex: And so that’s our show. Thanks for listening.
Nat: Yeah, we definitely encourage you to check out our website. We often hear that one of the best parts about our show is the set of links that we provide along with each episode, which I don’t know how to feel about that to be honest, Alex. I guess it is a pretty good set of links usually.
Alex: Yeah, we give a good link.
Nat: So check that out and we’d love to get your feedback and if you like our show feel free to subscribe so we can get our feedburner number up and feel better about ourselves and our place in the world.

Episode 4: Humans Only

For our fourth episode, we decided to try making a long, in-depth show about those squiggly word puzzles you find all over the internet, called CAPTCHAs. This is our first show that contains interviews, including of the happy fellow you see above, Dr. Andrei Broder, the Chief Scientist at Yahoo!. You’ll hear from him quite a bit in this episode.


Download episode

This show is almost 50 minutes long. We hope you enjoy it. Right now we’re thinking about this as sort of a special occasion. Most of our shows will likely be shorter — mostly because they’re easier to make (Nat spent over 100 hours on this one). Unless you tell us long is the way to go!

And on that note, we’d love to get your feedback on this show in the comments below. Constructive criticism and gushing encouragement are all welcome!

If you want to learn more about the topics we discussed, here are some handy links.

The Interviewees

CAPTCHA basics

Algorithmic attacks on CAPTCHA

Convolutional Neural Networks

CAPTCHA bypass services (aka CAPTCHA farms)

This episode contains two songs from Eternal Jazz Project, a Swedish jazz band that released some of their music under the Creative Commons BY-NC-SA license on magnatune. This episode is distributed under the same license.


Broder: There was a procedure called add URL, where you would come to a search engine and you would say, you know, here is the pages I just made. But anyway we had this problem and, of course, there was spammers and there were people that were adding the same page millions of times and wrote little scripts to add their pages. So we had somehow to slow the spammers. And this is how we came up with the idea that we need a test to distinguish between spammers and humans.
Nat: That was Dr. Andrei Broder, the Chief Scientist at Yahoo!, discussing his time at Altavista in 1997, when he led the team that invented a little thing called CAPTCHA.
Alex: And CAPTCHAs are the subject of our program today. We’re going to be exploring the state of the art in CAPTCHA generation and circumvention
Nat: I’m Nat Friedman, reporting from the Bavarian capital of Munich.
Alex: And I’m Alex Graveley, reporting from sunny, cloudy, cold San Francisco.
Nat: And this is Hacker Medley, the podcast for curious hackers.
Nat: Let’s see here. Word verification. Type the characters you see in the picture below. Okay. C-O, I think that’s a U.
Alex: Wait, is this supposed to be a word or is this just letters?
Nat: I think it says [coralia]. Is that a word?
Alex: I don’t know.
Nat: I think it’s just sort of just random letters that are pronounceable. Okay. I think it’s C-O-U, and I think there’s an R like tucked in there and that’s, wait that might not be an A actually. I think that, yeah, that’s an A. And then this is either a B or an LE.
Alex: And here Nat is trying to solve a CAPTCHA, one of those squiggly word puzzles that you see all over the internet, where you have to type in the words that you see in order to enter a blog, comment or create a new mail account or even participate in an online poll.
Nat: The estimates are that we are, human beings as a species, are solving over 200 million CAPTCHAs every single day, but the very first CAPTCHA was implemented at AltaVista back in 1997. I interviewed Dr. Broder at his office in Santa Clara and asked him to tell us how it happened.
Broder: I think from the very beginning we had kind of an idea that the problem has to be some kind of a pattern recognition problem because this is one area where humans are much better than machines. And at some point it sort of started from, I think some lunch discussion and we were pointing out, someone was pointing out machines are not yet incredibly good at playing chess. How come humans cannot make so much computation are good at chess and it’s all about pattern recognition. So we knew that we need a pattern recognition problem. And then we came with this one.
Nat: How did you come up with the algorithm for distorting text?
Broder: That one is a lot easier to tell you how we decided what things are because actually I had a scanner at home, and scanners were not so cheap as today, and I had a scanner, and I believe was made by Brothers but I’m not 100% sure, and the scanner came with a manual and they also had some OCR software, which came with the scanner. And pretty much I looked in the manual and everything in the manual that they said it’s bad for OCR.
Andrei: We decided why don’t we make it. So one of the things that we’re saying, well it’s bad if the letters are misaligned, so we said okay they should be misaligned. And it’s bad if you use multiple fonts, so we said okay use multiple fonts. So it was all there.
Nat: That’s a pretty interesting story, huh?
Alex: Yeah, I love those old stories of like hacker epiphanies that solve really complex problems. The funny thing is that search engines today don’t even use this scheme anymore, they just use PageRank, which crawls the whole web. But instead, CAPTCHAs have turned out to be incredibly valuable for locking out spammers from pretty much all aspects of the internet.
Nat: You know, what’s kind of amazing to me is that these guys, this little team at AltaVista 12 years ago, they came up with this human detection technique and it’s pretty much exactly what we’re using today.
Alex: Yeah, I mean it looks pretty much the same to us but it is somewhat different, like the state-of-the-art has pushed these things towards being much harder for computers to solve.
Nat: Yeah, that’s true. I mean pattern recognition techniques and AI and computer vision have advanced a lot since then. And actually, that’s a good point, that kind of brings us to why Alex and I think CAPTCHAs are so interesting. That little image, that little rectangle of distorted text on your web browser, that is kind of like a window into the world of artificial intelligence and how it relates to human capabilities.
Alex: Yeah, and specifically it’s just like this really interesting set of problems which are sort of described in that they’re tests that computers can generate and grade the answer to but which they can’t themselves solve very easily but that humans can solve really quickly.
Nat: So here are the criteria. In order to be a viable CAPTCHA, a test has to be something that’s beyond the frontier of current artificial intelligence, but well within the capabilities of even really, really average people. So in a certain way, the set of all viable CAPTCHAs describes the ways in which people are still better and more capable than computers.
Alex: Yeah, and it shows you sort of the places where AI still has to grow and the limitations of what we can do, at least with regards to image recognition.
Nat: That’s a good point. But, of course, bad news is that AI is getting smarter and we’re not. So, you know, for the time being, at least when it comes to recognizing distorted text, we’re still well beyond computers but there’s no reason it’s going to stay that way stay forever.
Nat: Actually, Alex, by the way, the idea of CAPTCHA goes back to an earlier concept called a Turing Test.
Alex: I’ve heard of Turing Tests but it’s funny, I didn’t know that CAPTCHA stood for a Completely Automated Public Turing Test to tell Computers and Humans Apart, which is a pretty long acronym, but the important thing in there is that it is a form of Turing Tests. Nat maybe you can explain what that is?
Nat: Sure. So back in 1950 Alan Turing, the father of computing, wrote this really amazing paper called Computer Machinery and Intelligence. And what you have to understand is, in 1950 the transistor was only 3 years old. So computers were like really big, they were room sized, they were really loud and they didnít do very much. So it was in this world of fairly limited computer capabilities that Turing asked an enormous question, and the question was: “Can machines think?” And this is like a philosophical question, and in order to answer it you’d have to define what thinking is.
Alex: But I mean it’s interesting because people are just sort of sitting around with this big old computers waiting for punch cards to be processed and they had their heads in the clouds of these sort of abstract questions.
Nat: Right. Now instead of going in a total abstract route though, Turing devised, he invented a game, a very concrete game, which he called “The Imitation Game.” And the way people usually describe the game is, you have a person who’s a judge, and he’s communicating with someone else who’s in another room, who could be a computer or a human being and they’re talking through little text messages, like IM or something, and the question is: can the judge tell if he’s talking to a computer or a person?
Alex: That’s sort of what’s become the Turing test, which has been around so long at this point that it actually represents sort of this like unachievable holy grail of artificial intelligence. And it represents, if it ever gets solved it represents the point at which computers can really convincingly simulate the interactions between humans.
Nat: Yeah. Actually when I was a little kid my friends and I used to talk about the Turing Test, as you said like a kind of major milestone in artificial intelligence that we figured would have been solved by now. But I hadn’t actually read the paper until we started doing the research for the show and what I discovered is that what Turing actually wrote is different from what we just described. See in Turing’s original paper there’s three people, there’s a man, and a woman, and the judge, and they’re all in separate rooms, and the judge is trying to guess which is the man and which is the woman. And then what you do is you take either the man or the woman and you replace them with a computer, and the question is, does that change the judge’s accuracy from when he was talking with two humans?
Alex: Kind of a weird twist. And the computer actor in that specific scenario is like trying to trick the judge into thinking that the human is lying and it’s all very confusing. I still don’t fully understand why that question is posed in such an obscure and specific way but, you know, it’s Turing so chances are good that he was thinking about something that I’m not.
Nat: No question about that.
Alex: You know, it’s interesting because I think the Turing test is only hard to pass if you suspect you’re talking to a computer.
Nat: Yeah. Actually there’s a whole bunch of examples of people spending hours talking to even like really poorly implemented chat bots that don’t even put a delay in before they respond to someone’s IM or something like that. So they respond in a tenth of a second. And actually I found a really funny screenshot online, it turns out there’s a Russian chat bot that’s called cyber lover, and what it does is it goes into chat rooms and on IM and it poses as an attractive female and it enters IM conversations with men, and it kind of gradually convinces them through these faked human interactions to give up their personal information. And so the screenshot is of the dashboard for this chat bot and you can see all the men that it’s talking to and it reports as it gets their full name and their address and their credit card numbers and that sort of thing. We’ll have to put that up on the website.
Alex: I’m just going to go call Visa real quickly.
Nat: Getting back to the Turing test, though, there’s a bet on the site, one of my favorite websites, between Mitch Kapor, who’s the founder of Lotus, and Ray Kurzweil, as to whether a computer will be able to pass a Turing test by 2029.
Alex: Yeah, and that’s the commonly understood concept of the Turing test, not the sort of gender guessing, gender faking one. Mitch Kapor is betting that computers won’t do it, which seems kind of negative to me, and Ray Kurzweil is betting that they will because his sort of whole singularity concept depends on it. And it’s a real bet. There’s $20,000 on the line.
Nat: So Alex, Turing posed this big question back in 1950, and then for 46 years, the AI community worked like crazy to try to build algorithms that could imitate human capabilities and even really simple uncontrolled situations. And they haven’t really quite got there. Actually I have a little blast from the past for you, Alex. Let’s listen to this.


Alex: Oh man, It’s my very first shrink.
Nat: I don’t know if you remember that from the sound blaster?
Alex: I totally do. It’s like one of those programs that were on the, it was one of the demos that came on the sound blaster install disc.
Nat: Yeah. And then they had one with the talking parrot. You remember that, too? It had a different voice?
Alex: I kind of remember the talking parrot. Can you simulate the voice for me?
Nat: I don’t think I could. So because computers were having so much trouble at even really simple human tasks, let alone actually imitating people in a human context, the conventional wisdom about AI has been, for decades, that AI is in a rut. But then in 1996, a researcher at the Weizmann Institute in Israel named Moni Naor, he looked at the situation and he saw an opportunity. He figured that the things that people could do that AI was still failing to do, he figured could be used by COMPUTERS to automatically tell computers and humans apart.
Alex: Moni’s paper is called “Verification of a human in the loop or Identification via the Turing Test.” And he had a bunch of really cool ideas, some kind of novel concepts for the kinds of puzzles that you could pose to humans to determine if they were in fact human.
Nat: Most of those puzzles you’ll see are kind of in the areas of like sensory processing, image recognition, that kind of thing. Actually I think we should just read a couple.
Alex: Alright, yeah. There’s one that was the Gender recognition, which is actually kind of difficult if you show a picture of a face determining whether or not it’s a male or a female.
Nat: I have trouble with that just in real life.
Alex: Me, too. I got hit the other day because of it. And there’s facial expression understanding, whether the person in the picture is happy or sad. And then there’s identifying body parts, which actually seems like a really difficult problem to me for computers to solve, being able to tell which, in a random picture, whether or not you can highlight the arm or the leg.
Nat: Hereís one I like, filling in words. Given a sentence where the subject has been deleted and a list of words, select one for the subject.
Alex: That’s kind of cool. Sort of text comprehension.
Nat: And he also here mentions handwriting understanding, which is actually pretty close to what CAPTCHAs ended up being.
Alex: And he mentions also speech recognition, which is used in audio CAPTCHAs today for blind people.
Nat: So I mean Moni’s paper gives us a pretty good inkling of what CAPTCHA could be, but he wrote the paper before CAPTCHA was actually invented. And a lot of these particular ideas, well they didn’t turn out to be that great.
Alex: Yeah, things like drawing a circle around a person in a scene or a person’s body part is actually kind of annoying to do in practice. And also things like guess the word that fits into the sentence you can do by, if you index a lot of web pages you can determine which sentences are common or which word structures are very common.
Nat: And actually whenever you have a test that doesn’t have very many choices, like for example a binary choice, like male or female, if you just write a script that guesses randomly you’re going to be right 50% of the time. So that’s a pretty good pass rate for a pretty short script. So you have to give the user like lots of binary choices, like five or ten or something like that, to make the random guessing pass rate low enough or whatever. But anyway, totally independently of this paper that Moni Naor’s wrote, you had the work that was going on at Altavista. So kind of industry and academia were converging on the same point.
Alex: Right. And a few years later at CMU, this totally awesome guy named Luis von Ahn and his professor Manuel Blum wrote a paper where they coined the term CAPTCHA and sort of formalized the whole concept. One thing that’s totally awesome in this paper and one of the reasons I like CAPTCHAs so much is that it points out that CAPTCHAs are pretty much a win-win situation, “either the CAPTCHA is not broken and there is a way to differentiate humans from computers, or the CAPTCHA is broken and a useful AI problem has been solved.”
Nat: Yeah, I love that, too. I think that’s really cool. So since 1996, 1997, the time when AltaVista invented CAPTCHA and thee papers, CAPTCHAs have become super wide spread. Millions are soles every day. And, by the way, the average CAPTCHA takes about 14 seconds to solve. So if you multiply that out that’s a lot of time that’s being spent by people solving CAPTCHAs every day. And with all this work being done, Luis von Ahn saw an opportunity, and with a couple of other people, founded a company called reCAPTCHA.
Ben Maurer: So my name is Ben Maurer. I’m one of the cofounders of reCAPTCHA and I’m responsible for the design of our API and for our infrastructure.
Ben: So people are solving 200 million CAPTCHAs a day, let’s say, and what they’re doing is they’re spending time doing something that by definition a computer can’t do. That’s automatically valuable because if we could give people a task that is useful then we’re getting something that we don’t otherwise have the ability to get. And so we said what can we do with all this, you know, with all this human computation power?
Alex: So just to cut in, in case you don’t know what reCAPTCHA is, you’ve probably seen these before: they’re the CAPTCHAs that have two words that you have to type, the words are usually in some kind of old or smudgy print face, and there’s maybe a line drawn through them.
Ben: And what we came up with is instead of having one word in the CAPTCHA we have two words and one of them is sort of a fake. It’s not part of the CAPTCHA it’s just a word that we don’t know what it is and we want you to tell us what it is and we do that to digitize books and newspapers and other content that computers can’t read.
Nat: So then what they do is they run two different OCRs over the text. Ben told me that they use a couple of commercial OCRs, and an open source one called Tesseract, which comes from Google, which is now considered pretty state of the art. And they identify words that the OCR software couldn’t recognize or doesn’t have a lot of confidence about. Ben explained it pretty well.
Ben: So OCRs are never 100% sure whether they’re right or not. But what we do is we take multiple OCR engines that use different algorithms and they tend to have failures that aren’t 100% correlated with each other. If they both agree then we sort of say we’re, it’s very likely that the word is correct. We use a few other signals such as, you know, does the word fit in this sentence? Like if you have, you know, one sentence we had in an old newspaper was that the motors ears were running down the street. And motor ears is something that just doesn’t occur in the English language and what happened is a C looked like an E to the OCR and we have the ability to say motor ears is a bigram that just doesn’t typically appear and it’s suspicious.
Nat: By the way, Alex, I thought it was nifty that they also use bigram probabilities to help identify which words the OCRs failed to recognize.
Alex: Yeah, I suspect that they’re using the one provided by Google where they have this huge bigram index, this big database you can download for a small fee, and it basically shows the occurrence of combination of words all over the web.
Nat: It makes sense actually also because Google ended up buying reCAPTCHA pretty recently.
Alex: Yeah. And reCAPTCHA has APIs in a whole bunch of languages. So it’s sort of a general-purpose CAPTCHA platform that you can just embed into your site. And these things are used everywhere, on Facebook, TicketMaster, Craigslist, wikipedia…everywhere..
Nat: Ben told, Alex, that reCAPTCHAs actually getting a whole lot of old books and newspapers transcribed.
Ben: We’ve done about, I think about 50 years worth of the New York Times already and currently reCAPTCHA users are solving 50 million CAPTCHAs a day
Nat: And by the way, Alex, The Times is paying reCAPTCHA for all that digitization work that they’re doing.
Alex: That’s pretty awesomely shrewd right there!
Nat: Definitely.
Alex: And they’re doing all this in pretty standard stuff with Python, nginx, and of a lot of intelligent hackery.
Nat: Actually with all that scale, solving 50 million CAPTCHAs a day, I asked Ben a little bit about the architecture, and specifically how do they store the CAPTCHAs on disk. Is it just on file per CAPTCHA image? And here’s what he saidÖ
Ben: Yeah, that was originally how things worked and that’s a pretty big disaster just because every time you serve a CAPTCHA then you end up doing a disk seek. And when you have a server that can serve a few thousand requests per second you can’t do a few thousand disk seeks per second. It’s just too slow.
Ben: And we found that one file per CAPTCHA, when we would get substantial load on the server the latency would become very high. So we actually use a custom file format to store the CAPTCHAs that allows us to load a bunch of CAPTCHAs into memory at once.
Alex: That’s great. That’s another one of those sort of problems that you only run into when you have really large amounts of scale.
Nat: Yeah, and it’s cool to peek under the covers of an operation like that..
Nat: Now, by the way, reCAPTCHA does just take the scanned word off the page and present it to you unmodified, they actually distort the word a little bit before you see it in the CAPTCHA.
Alex: Right, like I said, they maybe draw a line draw a line through it or they make it wavy. And recently they started using these like XOR blobs where they would sort of switch the foreground of the word with the background of the word for part of the word.
Nat: And the reason, Ben told me, that they do this is because, even though OCR software couldn’t recognize the word, you know, OCR software is not really designed to solve CAPTCHAs, it’s trying to get a balanced view of the document, so it might be possible to build an algorithm that could get enough CAPTCHAs right to be annoying. For example, Ben said that if you took a standard OCR software software and just tweaked it’s algorithm to use its second best guess for what the word could be instead of its best guess that might solve enough reCAPTCHAs to be a problem. So that’s why they add extra distortion just for extra safety.
Alex: Yeah, and everyone we’ve talked to has basically said the same thing, which is that reCAPTCHA is one of the toughest CAPTCHAs out there, which is important because you only need, say, 10% of CAPTCHAs solved by your bot to create thousands of fake Gmail accounts or get a lot of SPAM comments through. So the team at reCAPTCHA works really hard to make their CAPTCHAs as difficult to break as possible, while still trying to keep them easy for humans to solve.
Nat: And they’ve had a pretty good balance but CAPTCHA was not always as secure as it is now. And, Alex, there’s a funny story about that.
Nat: So back in the fall of 2004, Microsoft’s hotmail team, like most webmail services, one of their big concerns was SPAM, and specifically SPAMMERS using hotmail to send SPAM.
Alex: Yeah, and Nat, like every other webmail service on the planet, with the sort of first line of defense is to ask people that are creating a new account to solve a CAPTCHA.
Nat: So hotmail was depending on CAPTCHAs to protect them from SPAM. And they wanted to know: how safe are these things anyway? You know, hard would it really be to build an algorithm to break a CAPTCHA? So being Microsoft, of course, they have a really substantial research division right on campus. So they called up Microsoft Research and got in touch with a scientist in the division there named Kumar Chellapilla, who is a machine learning expert.
Kumar: Yeah, so my actual relationship with CAPTCHA comes from machine learning. So my actual PhD research work was on computational intelligence, and this is trying to build intelligent adversaries or agents that could act and train against humans.
Kumar: So these are models that you could train by giving it like input and output signal. And for some, for my PhD work I did mostly game playing like checkers and chess and so on.
Nat: When Kumar joined Microsoft Research, he did some work on OCR technology and handwriting recognition specifically for their tablet PC project.
Kumar: One of the common areas is signature analysis. How do you get a computer to look at two signatures and tell it to accept the signature or not? These are very, very hard problems.
Nat: And so Kumar sat down and he looked at the most prominent CAPTCHAs on the web from the biggest companies on the web at the time, and here’s what he found.
Kumar: And I was surprised. I have somewhat of an undergrad understanding of image processing, a doctorate level understanding in machine learning, and as I started applying some of these techniques, it was very easy to undo the challenges that were being put forth by the CAPTCHA. And I was so surprised at how quickly this happened, that we immediately, I think in November 2004, December 2004 there this famous machine learning conference called neural information processing systems then that was the first place where we presented a poster.
Kumar: And it was amazing. We had about half a dozen different CAPTCHAs that were provided by several different people in the industry and we could show that many of them you could break like one out of two, one out of four, two out of three.
Nat: Now, Alex, as it turns out, solving a CAPTCHA is something that actually breaks down into two separate problems: first is the problem of segmentation, and then comes the problem of recognition.
Alex: And I didn’t know this beforehand but segmentation is the process of breaking a picture of a word up into individual letters. And the recognition is then taking each one of those sort of subpictures and identifying which letter it represents.
Nat: And what Kumar quickly discovered was that recognizing the letters in most of the CAPTCHAs at the time was pretty easy.


Kumar: one of the problems we already solved by the time I started looking at CAPTCHAs was if you give me a single character, moderately distorted but not devastatingly distorted, then you sort of use your mouse or you point to the center of the character, I have techniques that can learn from that signal and basically give you the character that is there at that point.
Nat: The tool that Kumar was using was a special kind of neural network called a convolutional neural network.

Actually, why don’t we start off and tell people what neural networks are.

Alex: Yeah, sure. Neural networks are this sort of pretty widely-used technique in AI that’s been around for a really long time. And the basic idea is that you have these neuron-like elements that have inputs and outputs and the inputs and outputs are sort of arranged with inputs going into other neurons and outputs going into other neurons. So for a given neuron each input has its own weight, which multiplies the input value. The neuron adds up those weighted inputs, and if it’s greater than a certain threshold then the neuron fires, meaning that it sends a signal to its output. And the output signals of all these neurons sort of propagate through the network until you get the “answer” on a specific set of output neurons.
Nat: Exactly. So the basic idea for convolutional neural networks came from an experiment that was done back in 1959 by these two guys, David Hubel and Torsten Wiesel. What they did was they took a cat, and they put it under anesthesia. And then they inserted some electrodes directly into the cat’s visual cortex. And they opened its eyes and flashed different patterns of light and dark lines in front of the cat. And what they found was really interesting, they found that some neurons in the cats visual cortex fired rapidly in response to lines at one angle, and some neurons fired rapidly in response to lines at a different angle. So there was some angle sensitivity to different groups of neurons. And there were other neurons in the visual cortex that were totally angle-independent.
Nat: So what happened subsequent to that is, you know, this was obviously a pretty big result in neurology but some computer scientists got a hold of it and what they realized is they could take neural nets and they could arrange them like a cats visual cortex was arranged. So the lowest level you’d have neurons which are recognizing simple features in the image, like corners, edges at a certain angle or end points in certain regions of the image. And then there would be subsequent layers, which are usually called the hidden layers, in the neural network, and these subsequent layers would sort of combine those basic features to detect higher-order characteristics or features in the image. And if you have enough of those and the right kind you can start to recognize even really distorted letters or objects or things like that. So it turns out that this special type of network, this convolutional neural network, which is sort of roughly based on the way vision works in mammals, is really good at image recognition.
Nat: Specifically, these networks were really good at recognizing handwriting. And so when Kumar got assigned the whole CAPTCHA project he’d already been through Tablet PC and he had all this handwriting recognition experience and therefore he had this really powerful image recognition tool at his disposal. And when he took this thing and he pointed it at the state of the art CAPTCHAs on the web it just blew them away.
Kumar: So four out of five characters I’d be able to recognize correctly or certainly nine out of ten. So that was sort of like, that was the platform for almost all of my techniques. I would try to reduce every CAPTCHA I saw out there with some ad hoc processing down to a place where I could just give it maybe like five or ten locations where I thought characters were and then this system would, it’s not free because you have to label like thousands and thousands of these laboriously but it’s a very automatable technique.
Nat: So, Alex, with the recognition problem solved, for Kumar breaking CAPTCHAs basically came down to just identify the locations of the letters. And this is the segmentation problem. And in the CAPTCHAs that existed on the web in 2004, segmentation was actually not that hard. Kumar explained to me how he solved from Ticket Masters CAPTCHA.
Kumar: They were exclusively using these grids of slanted lines. They were almost regular but not exactly. They would tilt a little bit between the different parallel lines in the grid, but their text was always horizontal so, and the text was always thicker than the grid. So if you did some blurring the background lines would blend into the background and then the words will stay up in the front.
Kumar: So if you had the word hello on a white piece of paper, hello being typed in black, and then you could get a stat of that page to this connected components algorithm, and what it will do is it will take, it will start at one of the black pixels, let’s say the H, it would grow that by looking at the neighboring black pixels and it will slowly grow it into the letter H. Then once it has reached the edge of H it will no longer blend into the background so it will remove that letter H as a character. And you can repeat this iteratively until you get all the characters. So that’s another one where I think the Ticket Master one was reduced down to one of those and then we could build that out. also had a similar one. The very early MSN Hotmail one also did not have enough arcs, so some of the characters would not even be touching so you could easily eliminate those.
Nat: So what do you do when the characters are touching?
Kumar: So if this island analogy works, you can think of everything in the background is a big ocean because it’s relatively flat and there’s these islands sticking out. You could grow the islands, let’s say there’s more sedimentation and the land mass kinda moves out into the water, then if two islands are very close to each other then they may grow and they may connect each other. So that allows you to sort of connect things. And that’s usually like a growing operation standard image processing, things like halos and so on you can add to objects that way.
Kumar: You can also do erosion, which is the opposite. You remove pixels that are very close to the edge of the character, so in the same island sense, you’re losing, because part of the island is eroding into the ocean and so that way you can separate two characters that are connected. So if you have two, let’s say you have two O’s that are connected by a thin line, unless for a simple suggestion the O’s are more like filled in circles, then as you erode the circles are relatively in all directions, so they may become smaller circles still filled in, but the line that’s connecting the two circles would slowly get to a point where once it becomes really thin, one pixel wide, another step of erosion would just completely cause the connected pixels to go away. And now you’ve broken two O’s connected by a line into two O’s. And once they’re separated you can then do the opposite. You can now start to grow them back. And so if you do something like the four steps of erosion followed by four steps of growing you would lose every line or anything that was thinner than four pixels wide.
Kumar: And so that’s like a common, you erode to just make them disconnected, then you grow them back so that pieces of character that provide the erosion connect back.
Alex: That’s a totally awesome description, but I suspect in practice it’s a lot more nuanced and probably required a PhD to understand what the hell’s going on.
Nat: Well actually the paper’s pretty well written. It’s pretty accessible. We’ll put a link on our website if you want to check it out. But what he basically said in the paper was: recognizing distorted characters is solved. If you want to make CAPTCHAs really hard, lean on the segmentation problem because identifying the locations of the characters is really surprisingly hard if you do things like make them touch or don’t just do totally trivial things to your CAPTCHA. So the best CAPTCHAs on the web today have adapted to pose really harder segmentation problems.
Alex: It seems so weird to me that image recognition can, you know, identify a letter, it just can’t figure out where it is.
Nat: I know, right? It’s not intuitive at all. Now actually, even though a lot of these issues were pointed out five years ago in Kumar’s paper, and Google and Microsoft and Yahoo and reCAPTCHA now have really good CAPTCHAs that are hard for computers to break, a lot of the CAPTCHAs that you find on the web and in the wild, and you and I have both run into these, they still mostly pose a recognition problem and not a segmentation problem. Actually, I asked Dr. Broder about this, and here’s what he said.
Broder: You know, I see some CAPTCHAs that clearly are very hard for humans to solve but in fact they don’t introduce any difficulty for computers whatsoever. They are simply creating some extra annoyance for humans without getting any quality. I mean people have to realize what are the hard problems and what are not the hard problems. And some of the CAPTCHAs are totally silly and I’m sure that you can use them as an exercise in any course in pattern recognition and people will solve them.
Alex: If you look around the web today you can find like little Python scripts or other little programs that you can run to break some of the weaker CAPTCHAs out there.
Nat: And, Alex, actually I have a little treat for us. I did some googling and I found a university student in Northern England who wrote a particularly cool CAPTCHA solver.
Shaun: Well, my name is Shaun Friedle and I’m the author of Megaupload auto-fill CAPTCHA, which is a GreaseMonkey script for Firefox which auto completes the CAPTCHA on megaupload.
Alex: Woah! That’s such a great hack, right? Like this guy decided to start solving CAPTCHAs in the browser using Javascript.
Nat: Yeah, I mean this is the definition of like a good hack basically. I mean what Shaun Friedle did is he wrote a GreaseMonkey script that solves just this one particular CAPTCHA, on a site called Megaupload, which I’ve never heard of before but apparently is like one of those websites like rapidshare where you can upload a file and people can download it. And they use CAPTCHA to protect the download link from bots. And I asked Shaun how he got into this, what motivated him to do this in the first place.
Shaun: And then I came across a farm thread on the user scripts to [our] site. Someone else was asking if it was possible to decode the reCAPTCHA script in using a GreaseMonkey script and all of these people were saying no that’s stupid, there’s no way you will be able to do it, that’s impossible.
Nat: And so you took that as a challenge, huh?
Shaun: Yeah, I thought well I don’t know if it’s really possible in the reCAPTCHA, I thought well I can probably try and do that just in GreaseMonkey purely in JavaScript on the megaupload CAPTCHA. And at that time I had done no image processing in JavaScript. In fact, I ran about 100 types in JavaScript before that point so I’m not really a JavaScript programmer. So I started researching whether it was possible and I found out using the canvas functionality in HTML 5 you could do some image processing and eventually built it from there and managed implementing the entire thing in JavaScript.
Alex: I actually hadn’t heard of anybody doing sort of external image processing using Javascript and CANVAS like that.
Nat: Yeah, actually Sean was really humble and he said he’d never done any image processing in Javascript before. But I think like almost no one had ever done image processing in Javascript before he wrote this hack. And then it ended up on John Resnick’s blog, who is the author of [J Quarry] and a lot of people found it pretty interesting. That’s actually how I found out about it. But it does seem like a technique that could be useful lots of different places. Anyway, Shaun had also previously read this game programming book, and learned about neural networks from that, and so he implemented a neural network in Javascript, and then he manually trained it by typing in a whole bunch of CAPTCHAs himself to recognize the megaupload CAPTCHA.
Alex: And his script still works?
Nat: Yeah. He told me it can solve the Megaupload CAPTCHA in about 200 milliseconds.
Alex: That’s not bad. I think it’s a neat hack because it’s not necessarily anything novel research wise but doing it all in Javascript inside the browser and being a novice. It just seems like really educational.
Nat: We asked everyone who has been a CAPTCHA like what they think when they run into CAPTCHAs on the web and he said the thing he thinks is that about 60% of the CAPTCHAs he encounters he could probably hack with his GreaseMonkey script with a few hours of modifications. And of course he’s talking about the CAPTCHAs, not the big company CAPTCHAs but the ones that don’t pose really hard segmentation problems.
Alex: Yeah, and that totally goes against my original thinking when we started this podcast, which is that the way you can be sure that a CAPTCHA works is by writing your own because you’ll be able to sort of hide anonymously on the internet because people won’t spend time solving your particular CAPTCHA. But it turns out that, you know, unless you’re kind of tracking the leading edge in image recognition technology, like reCAPTCHA is, you’re CAPTCHAs are probably going to end up really, really trivial to solve.
Nat: And you’re kind of right on one count though, Alex, which is that if you’re site is really tiny and nobody cares about it they’re not going to try to bother to break your CAPCHA anyway. But the cool thing with reCAPTCHA of course is that they’re going to always keep up with the latest attacks. It’s like a platform that’s always going to evolve with the attackers.

Now we’ve been talking about some pretty sophisticated ways of attacking CAPTCHA. But there’s one very easy way to break a CAPTCHA we haven’t mentioned yet.

Alex: Is this the sort of legendary porn attack that I’ve always heard about?
Nat: Well that’s one. Why don’t we talk about it first?
Alex: Yeah, so I always heard that, like there’s always been this rumor that porn sites would stick CAPTCHAs up in front of people who wanted to look at porn images and they would have to solve the CAPTCHA in order to move on and see the pornographic image. And that CAPTCHA they solved would then be forwarded along to some script that was creating an email account or posting a comment.
Nat: This is like exactly the kind of story that’s designed to just be spread all over the internet because it involves like a cool hack and pornography. But it turns out it’s not really an issue. The volume of CAPTCHAs that would be solved by this technique is just too low to actually make a dent. And it’s not really a very competitive thing for a port site to do: for every one site that issues CAPTCHAs in front of their images, there are a thousand that won’t. So it doesn’t add up economically. There is another way that humans can be used to break CAPTCHAs that is actually is a bit more of an issue.
Alex: Oh is the sort of like CAPTCHA Farm things in India with lots of people solving CAPTCHAs.
Nat: They actually prefer the term “CAPTCHA bypass service.”
Alex: I’ve heard of these, too. These are like teams of very low-wage people usually in poor countries just typing in CAPTCHAs for very, very small amounts of money all day long. And I guess these guys break CAPTCHAs and then they get forwarded along to create SPAM and blog comments and things like that.
Nat: Yeah. And actually we heard a funny story about this from a friend at Google. Apparently Google has this property Blogger, and apparently they were having problems with people creating SPAM blogs on Blogger. So they added a CAPTCHA to the blog creation page and that helped for awhile. But then eventually the spam blogs came back. And they tracked the CAPTCHA solutions to this one IP address in Costa Rica. Instead of just blocking the server they decided to monitor it and they could see the rate at which the CAPTCHAs were being solved, it’s actually changing over the course of the day. At 9am they’d be solving something like say 10 CAPTCHAs per minute, and then half an hour later, at 9:30, they’d be solving like 20 CAPTCHAs per minute, and at 9:45 they’d be solving 30. And then it would maybe continue like that until 12 o’clock and drop to zero for an hour. And then at 1:00 it would pick back up again. So they could deduce from this there’s a team of four people, drifting into work in the morning and then all going to lunch together in the afternoon solving CAPTCHAs for a living.
Alex: The funny thing is that when these guys solved CAPTCHAs and then those CAPTCHAs are used to post SPAM on web pages, you know, they’re not actually expecting people to click on the links that are included in those SPAM comments, they’re usually just there to just trick Googles PageRank algorithm into rating the spammy links higher.
Nat: Yeah, actually that’s a really good point and PageRank is big money. So most of these CAPTCHA farms area actually a lot bigger than just four guys in Costa Rica somewhere. We tried really hard to try to interview a CAPTCHA farmer for this podcast. None of the ones we contacted would agree to have their voice recorded for some reason, but they did answer some questions over email, and we’ll link some of their web pages online where they advertise their services.
Nat: And you can see, for example, that the prices are just, I mean they are astoundingly cheap. To solve 1000 CAPTCHAs, for example, one site called charges for just $2. So even if the workers take the average of 14 seconds to solve each CAPTCHA, and they don’t have any time between CAPTCHA solutions, that comes out to like fifty cents per hour. And actually by email, we learned that many of these CAPTCHA farmers are further kind of hindered by the fact that they’re not great typists and they don’t speak any English at all.
Alex: Yeah, that’s a pretty sucky situation. You can imagine how hard it would be to solve CAPTCHAs in Hindi. These guys are probably not even solving at the optimal or like the average 14 seconds for each one.
Nat: Yeah. I mean if we had to solve CAPTCHAs in Hindi, good Lord. The service, decaptcher, they provide APIs in a whole bunch of languages, you know, C, C++, Perl, Python, C#, etcetera, and they even a FAQ question on their website. I’ll read it for you. Here’s the question: I want to bypass CAPTCHAs from my bot. The bots all have different IPs. Is it possible to use your service from many IPs? Then they answer: we have no restrictions about IP: with DeCaptcher you can bypass CAPTCHA from as many IPs as you need.
Alex: Wow. So they’re just like right out in the opening about using botnets to solve CAPTCHAs huh?
Nat: Yeah, seriously. What it comes down to, really, with these CAPTCHA farms, you know, you can’t stop them. They’re going to be out there, they’re going to exist. It is a solution that somebody can just type in a CAPTCHA. It’s not totally secure. So CAPTCHAs not about total security, CAPTCHA is really just about making spam uneconomical. If we go back to Broder at Yahoo! one more time, I think he put this really well:
Broder: Yeah, this is exactly right. I mean it’s exactly the same problem you have in mail spam and there are actually nowadays good statistics about how many people are actually answering those ads for changing your anatomy and so on. And it’s an incredibly small number, 1 in a million or 1 in 10 million or something like that. And you can essentially compute a certain ROI so if you increase the cost even slightly suddenly the whole enterprise becomes non profitable. And I think that’s basically what we are trying to increase the cost slightly but because you have to multiply it by a large number of attempts to make it nonprofitable.
Nat: So we’ve kind of moved from this big philosophical question, Can Machines Think, to rooms full of poor people typing in squiggly letters to help sell Viagra on the internet.
Alex: Yeah, but along the way in talking about computers solving and posing these questions that really represent the bleeding edge of artificial intelligence. And it’s funny to think like, you know, could Turing have imagined that this would be the battleground for his, you know, his sort of ultimate question of can computers think..
Nat: So I think one thing people want to know is where’s all this all going? I mean what is the future of CAPTCHA? Let’s talk a little bit about that.
Alex: Well, it’s still an active field so there’s new CAPTCHAs being invented all the time. I did one for Microsoft that just came out, it’s called ASIRRA, and this works by showing you a picture of an animal and asks you if it’s a dog or a cat. Now, like we said before, that’s a binary choice so they end up showing you a few of them to see that, you know, so that the odds of just guessing randomly aren’t quite so good. And there’s this other one from Google called rotCAPTCHA, which asks you to tell which pictures are facing right-way-up, which I guess is also difficult for computers.
Nat: Yeah, and actually computer vision, of course, is advancing, too. Actually, Kumar, from Microsoft, told me that the whole 1d segmentation problem – a bunch of letters are in a slightly wavy line – that’s getting solved, too. So the future in CAPTCHA letters might need to be scattered around in 2d in a plane. But eventually, of course, the machines are going to be able to do that, too. So the question that I kind of wanted to know since we started looking into this whole topic is – when’s that going to happen? When will CAPTCHA no longer be viable as a concept? Here’s Kumar Chellapilla once again.
Kumar: It’s an adversarial problem. So if you’re blocking spammers they’re going to work harder and they’re going to automate existing solutions to make them cheaper, like more adversarial problems. There is one advantage, though, I should call it out. It’s a lot easier to generate I mean lots and lots of more difficult CAPTCHAs than it is to break them. So even in vision or in machine learning, computer vision machine learning, people talk about this synthesis forces analysis dichotomy, right? Is it more difficult to ask difficult questions or it more difficult to answer difficult questions? And there are a lot of these nonsymmetrical problems where for the email or any freebie research entities, they can easily generate lots and lots and lots of difficult CAPTCHAs.
Nat: So, Alex, before we started researching CAPTCHA for the show, I was pretty convinced that we were going to find out that AI was on just this collision path to make CAPTCHA irrelevant within, I don’t know, five to ten years. But it really doesn’t look that way to me anymore. I mean basically I think CAPTCHA is probably viable for a couple of decades or maybe longer.
Alex: Yeah. And the other thing is that CAPTCHA is not even really designed to be 100% secure. So as these things slowly become more and more solvable it doesn’t necessarily mean that the whole system will fall apart. It just means that there’ll be a little bit more SPAM an that will push the edge of research a little bit further.
Nat: Yeah, Kumar actually compared CAPTCHAs to a speed bump. So it’s sort of like a little deterrent, you combine it with other techniques like content filtering and that’s how you get a really good result. And, you know, even if the computers do catch up we go back to that whole win-win concept. I mean that’s a win, too. I think Ben Maurer from reCAPTCHA said it really well..
Ben: I mean, if we get to the point where computers are able to do anything that a human can do I’ll be happy. I mean at that point computers will be able to do a really good job at filtering SPAM on their own and they won’t need CAPTCHAs.
Nat: Well, that was our show. We had a lot of fun studying CAPTCHAs and we hope you enjoyed it, too. We’ve posted a whole bunch of interesting links from our research on so that you can learn more about the Turing Test and neural networks and cat brains, and that kind of thing. So check it out.
Alex: This episode was a bit of an experiment in doing a longer form show, with interviews no less. So we’d love to hear if you think it worked and especially if you think it didn’t work. So please visit our website at and give us some feedback.
Nat: Thanks for listening.
Alex: Yeah, thanks.

Episode 3: tornado, node.js and websockets

A quick overview of a few interesting new web technologies: tornado, node.js and WebSockets. Listen and enjoy!


Download episode

As always, we’d love to hear your thoughts and dreams and deepest desires.

If you want to learn more, check out these links:

Episode 2: A brief introduction to NoSQL databases

In our second episode (12 minutes long), Alex and Nat talk about the new generation of “NoSQL” databases that have created a lot of interest among web developers; especially those lucky people dealing with thousands of simultaneous users and terabytes of data.


Download episode

Please feel free to leave a comment below after you’ve listened to the episode. We’re still total newbies at this podcasting thing, so your feedback and encouragement are a big help!

If you want to learn more about NoSQL than what we covered in the show, check out these links:

The Big Guys:

  • Voldemort
  • Cassandra
  • HBase — We didn’t get to this one, but it’s modelled on BigTable, and can replicate across geographically separated datacenters (Cassandra needs faster roundtrips). And it’s what Hadoop uses internally.


  • MongoDB — Great for storing JSON objects.
  • CouchDB — Erlang based, uses javascript as a query language.


  • Redis — memcached with persistence and useful list/set/ordered-set datatypes.
  • Redis twitter implementation — simple example of building a twitter-like system on top of redis.

Underlying Technology

The image above is a picture of a Google datacenter in Oregon, where they no doubt run BigTable.

Pilot Show: The 26c3 and GSM security

Welcome to Hacker Medley! We decided to try podcasting.

In our pilot show, Nat Friedman shares what he learned about mobile phone security at the 26th annual Chaos Communications Congress in Berlin.


Download episode

It’s our first effort, so it’s a little rough. But please let us know what you think so we can decide whether or not to keep making these!

If you want to learn more about the stuff Nat was describing, here are some handy links:

The image above is Harald Welte presenting at the 26c3.