Gibberish Detector

- by Robert Giordano | Blog Index


For the past month, I've been working on a gibberish detection algorithm. What could this have to do with Linkatopia? Grab a drink and I'll tell you...

Gibberish detection is used to determine if some text contains actual words or just random letters. Doing this is much more complicated than you might think because it involves human language. I looked at all of the currently available algorithms and decided to build my own from scratch. I spent the past month building it. As of yesterday, I have successfully met my goals. You can see a working version of my code and how it compares to the others here: design215.com/toolbox/gibberish-detector.php.

Okay, so what does this have to do with Linkatopia? Well, gibberish detection is just the first part of a larger puzzle. On my ancient 2012 Macbook, my algorithm can analyze 96,000 words a SECOND and tell you if each word LOOKS LIKE a real word or gibberish. It's important to realize my code is *NOT* checking a dictionary or any database of words! I "taught" it what English words look like and it can tell the difference 99.997% of the time.

Now let's talk about Facebook for a minute with their BILLIONS of dollars in resources. How good is their so-called "AI" (its NOT real AI) at determining whether a post or comment is against "Community Standards"? Does it get it right even 90% of the time? No? How about 80%? 70%? 60%? Think about that for a minute.

SOME kind of automated system is necessary when you have a platform as popular as Facebook. Its a challenge to have enough physical humans to sift through all of the millions of posts and comments. If you hire people in other countries to moderate English text, you're also going to have a problem with local slang. I'm just surprised Facebook can't come up with something better than what they have.

I was recently in "Facebook Jail" for 30 DAYS because of their poorly designed algorithm. But I think its a good example to talk about. I've created and sold fine art in galleries for almost 20 years now. Many people know me as an artist and they have no idea I also build web sites. I've won awards in juried shows, I've been in Art Basel Miami, and I've had my fine art nudes on display in public spaces, including the Fort Lauderdale Public Library. One day I simply made a comment that I "sold one of my nudes for $1200.00." This resulted in a 30 day ban for "solicitation". Okay, so their algorithm saw "sold ... my nudes for $" and flagged me. But the algorithm didn't look at context, and there was no search on my name to see if I was an artist. Both of those things could have been automated.

Back to Linkatopia. I know at some point there will be a need for moderation. I experienced this years ago with spammers. As I said above, gibberish detection is just the first part of a larger suite of tools. My goal is to build superior automated functions that can look at the nuances of human language and match words with context. I want to separate racism and hate speech from humor and sarcasm. And, I want it to work 99% of the time, or better!

I'm serious about building a better platform!


take care,

Robert




 

 Saving...