Showing 1 - 1 of 1 Movie Blogs
Cleaning out my inbox- there's a slew of spam that managed to sneak past the spam filter. An example reads:
Original one at 'too much' price tag, our is < hundreds $.
Need your shipment quickly? Have it lightning shipped!
Prove to everybody that you've became a success together with our watches.
Exceptional manufacturing guarantees that our watches are almost the same
as the authentic wristwatches.
Our honesty, your contentment. Money-back otherwise.
disagreeable task, i damaskeening found eurytropic a stout young my fate,
shaving mirror for what sick headache business had jasperise i on the land,
"hurrah!" cried jill, waving single-action mock orange the letter
I think our filter has got to be operating on a simple set of rules and a bag of words model (a la the popular Bayesian posterior probability model).
However, the section that's been bolded out is basically nonsensical, filling the email with enough cruft so its term distribution is skewed enough to not be labelled as spam. Several spams that made it through have started using similar tactics to evade filters. This naturally leads to the issue of using language models to represent how nonsensical a sentence is, from both the grammatical and common use perspectives. You could measure the "nonsensicality" of a sentence, paragraph, and compare it against the entire trend in the email to determine if it indeed is spam.
This isn't an easy silver bullet for this vector, as many people write emails hastily, leading to poor grammar, and some may not be native English speakers, leading to possible false positives.
Of course, the best solution is to strike at the economics of the situation, by making it unprofitable to do business using spam. But until that happens, we're stuck with using filters.