- anything that allows file upload -> porn / warez / movies / any form of copyright violation you care to come up with.
- anything that allows anonymous file upload -> childporn + all of the above.
- anything that allows communications -> spam, harassment, bots
- anything that measures something -> destruction of that something (for instance, google, the links between pages)
- any platform where the creator did not think long and hard about how it might be abused -> all of the abuse that wasn't dealt with beforehand.
- anything that isn't secured -> all of the above.
Going through a risk analysis exercise and detecting the abuse potential of whatever you are trying to build prior to launching it can go a long way towards ensuring that doesn't happen. Reacting very swiftly to any 'off label' uses for what you've built and shutting down categorically any form of abuse and you might even keep it alive. React too slow and before you know it your real users are drowned out by the trash.
It's sad, but that's the state of affairs on the web as we have it today.
FWIW I made a website for our yearly project conference that allowed anybody to create an account and post material. But 1) all postings had to be moderated until the account was verified (either manually by me or by a code sent only to conference attendees) 2) I was pretty active in monitoring new posts and deleting posts and accounts that were clearly spam.
And of course the normal stuff, links all prefixed with "nofollow", only whitelisted markdown content / HTML, &c.
The first year we had a handful of spammers try to post stuff, but it didn't take much time at all to filter it out. This year we didn't have any spam accounts at all.
So my suspicion is that getting ahead of the "spam problem" by doing heavy manual moderation early on is an investment. If he'd just spent 5 minutes a day looking at the accounts and deleting rubbish, he would never have been swarmed by spammers, and the moderation load would have remained relatively low, until he got popular enough that he could afford to actually spend some real effort automating / crowdsourcing the spam-fighting capabilities.
The key here is manual moderation. At NodeBB we also have had our fair share of spam companies trying to build scripts to post things, and the only foolproof solution is manual moderation via a post queue for new users.
The downside, of course, is that they takes effort to maintain, and is a barrier to entry for new accounts.
Edit: Why does shadow banning feel like such an elegant solution? Everything has tradeoffs and I feel like shadow banning has tons of upside and very little downside. What am I missing?
There are cases in which explicit and obvious moderation results in retaliation, including DDoS and hacking attacks.
Various soft fail mechanisms, including shadowbanning, degraded performance, errors, authentication failures, etc., may help avoid this.
The question is what adversarial model you're facing: is it some random pr0n / SEO / affiliate / fraudster, or is it a "nice website you gots heyah, be a shame if anyting happen' to it" squad?
The only risk I can think of is that someone will upload the video and then livestream its playback on your platform with your branding using a separate livestreaming tool.
I.e. shadow banning semantically cannot work with a site that is geared toward publishing content, because the content is supposed to be visible to visitors who are not authenticated. If it isn't visible, that is painfully obvious.
Shadow banning only works when only authenticated users can see any content at all; then we arrange for only the offender to see content they have created. This works as long as the offender doesn't create multiple accounts.
(You need something slightly more clever, like allowing the content to be viewed from the same IP address as the last known address for that offender, plus some surrounding range (e.g. IPv4 class C subnet)).
It’s kind of easy to check if your posts are visible for not logged in users so you aren’t fooling bots and spammers, just real people. And the bots don’t care about moderation drama but the real people do, no matter if the moderation is proper or not. It always looks really unfair.
This invites a couple of questions. One, do the spammers care that the content is actually visible? Two, does spam come from more accounts or the same accounts?
Unrelated, but seeing NodeBB reminded me of the old days when installing forums for communities. phpBB2, vBulletin, downloading plugins and themes, etc. Oh, the nostalgy for the old web is amazing...
100% this. As soon as an IP / Service appears for the first time it is scanned and hammered to look for low hanging fruit. Once the initial wave of exploit bots have found out they don't get far, your 'thing' is marked as 'too much faff' and malicious traffic drops off and is left alone (usually, y.m.m.v etc etc etc).
My experience is different. I manage a wiki. It was a wonderful success in terms of getting people to contribute, however the wiki itself (moinmoin) has lousy authorisation and authentication tools, so I wrote my own. A few versions later changed to moinmoin broke those tools, but by that time most of time content has settled down so I just reverted locking down the system by making the user database read only (literally: "chmod -R a-w"). That worked well enough.
However every so often you would have to create a new account. To do that the chmod had to be undone, but only for a minute or two. In that minute or two typically 2 or 3 spam accounts where created, and maybe 20 wiki pages spammed or defaced.
In short: defending the site had no effect whatsoever for us. The bots were always probing, every few seconds, and it never stopped even years after the site was totally locked down.
PS: SPAM wise the bots were just an annoyance. But every page hit runs moinmoin's Python code, and it's not the fastest thing. We were running on low end VPS's that took a dim view of anybody using too much CPU. Our VM regularly got shut down because of those bloody bots.
Perhaps you could hand-tailor a fail2ban rule that automatically jailed anyone who even _attempted_ to log in, for a whole month, and leave that running for a week prior to the chmod, and disable it during the editing process?
Right, so I think the key think spammers need to make spam "pay" is to make it scalable: only 0.1% of people will click on your link, but if you can generate a million new "views" per month, then that's 1000 clicks per month. So the key thing for a spammer is to be able to automate as much as possible. One key element in preventing spam, therefore, is to try to make it so that a human has to be in the loop somewhere.
In my case, I wrote the website; I suspect any login bot would have to be customized to my specific website to be effective. I'm sure it doesn't take much time, but you'd still have to start an automatic bot specifically for my site. Why would you do that if you saw that all of your attempts to get spam up had failed?
In your case, you're using a standard tool. Someone's already written a bot that can log into any moinmoin instance; and almost certainly someone's written tools to scan all websites for new moinmoin instances and try to create accounts. It's probably unlikely anyone has specifically thought about your site at all; they'd probably have to write special code to remove it from their automatic scans. Same thing as before -- even if their bot fails on 99% of moinmoin sites, that 1% makes it worth keeping it going.
> PS: SPAM wise the bots were just an annoyance. But every page hit runs moinmoin's Python code and it's not the fastest thing.
Interesting, thanks for sharing. I'll be launching a service that allows users to upload content and I've been thinking about how to handle anti-spam. I really appreciate you sharing your experience.
Manual moderation in this way could open you up to other forms of liability that you may have easier defense against without it.
I don't think it'd be an issue for a hardware forum, although mass approving comments and finding that one is defamatory about person or company X could get some lawyer's drool glands working for example..
Also section 230 appears to be under threat by the current administration and how things are interpreted - and I just read where Biden has been talking about removing it completely..
So this situation is in a state of flux at the moment I believe.
I am trying to understand this comment.
I do not claim to know everything about 230 - but have read several news stories about it, in relation to it- and have sat in an a seminar by a high dollar lawyer once..
Since I did not stay in a Holiday Inn express I can't claim that I know it all - haha
which is also why I put not a lawyer / doctor / yada yada - so people would research it more.
I believe this would be more of an issue for some kinds of sites and much less an issue for others. (I doubt tom's hardware has ever had anything defamatory or libelous in posts to worry about ) - other places that moderate may do so in ways that are over-zealous in protecting the publishers and therefore that would create a different issue, but that's a different discussion.
Anyhow if you are referring to an anti-censor group or some kind of misinformation army like the bots and confangled news stories around the net neutrality debate, I would like to learn more about this/them and see why / how etc.
thanks
From what I have heard and seen - there is a difference between moderating each message before it's posted - eg - classified ads person answers phone for local paper tells you that you can not advertise a tiger for sale or put 'foreman' wanted in ad - it must be gender inclusive.. is one type..
Then you have moderating where people can post about anything - but when you get a complaint you research and take it down.
There are other forms and types of moderation between these..
which is why I said "in this way could open"
if you are moderating each message before if goes out, you are not acting so much as a platform as more of a publisher.. you could claim you did not know that an ad you published was not legal.. say it was turtles for sale, or someone offering weed for sale with 'codewords'.. you could claim that you did not know the girl in the ad not in a sex tape, that you just published the info...
But from what I have seen, if you allow everything to publish and take down stuff when notified you have a difference situation in regard to section 230, and claims in general, as compared to having to defend that you ad person could of should of known before publishing..
Indeed - section 230 doesn't care so much about what is moderated out in some ways, but the fact you are moderating out 8 year old's dating profiles, but not 13 year olds could become an issue- and the fact that you moderated out the 8 year olds in the first place could cause holes in certain defenses.
So you would not be responsible for that isn't said - but the fact you allowed one defamatory thing to publish while hiding a defamatory post about someone else - what isn't said could be an issue to prove that you have liability for what is.
Hope I'm saying that right, again this is my understanding of some issues that I have been involved in, and studies other people's issues and listened to their counsel - but I am not a lawyer.. if you have one and you are doing this kind of moderating perhaps have them look into the backpage cases and see how they pierced the "we're a platform, not an editorial publisher' thing. for one example.
It's sad and also not always true. I think most people were pleasantly surprised by how the "get into a stranger's car" and "go stay with a stranger for a few days" models worked out for AirBNB and Uber.
Obviously there have been some issues and obviously they have processes in place to minimize and respond to abuse of their platforms, but overall both platforms rely on "trust by default" which is interesting and leads me to think that your comment might lean too far towards pessimism.
AirBnB and Uber are a bit different though, because being online gives you a sense of immunity and relieves a lot of inhibition that you'd have in real life. To some degree, we all have a different sense of morality online and in real life (like most people would never steal even a fruit in the supermarket but see no problem pirating movies and software) and some push it very far.
The one that always surprises me is NextDoor. Given the much greater chance that you will literally run into someone on NextDoor in real life, and the fact that NextDoor accounts show your actual identity, you would expect discussion to be more civil there than on other social media.
But, holy cow, every time I go on there, it is a hot mess of angry people screaming at each other. I have to assume those people just don't go outside at all?
I mean, there are plenty of people who would say bad things about others, even if they know it'll get passed on to them, than go up to them and do so directly. Plus you have the usual problems that come with NIMBYs+HOAs all in one app.
Whether it's through moderation, or the real name policy, NextDoor has avoided the worst of 4chat. There are no page long posts of a single repeated epithet, and there's no child pornography, and the only drug talk is in reference to the local homeless problem (though that may also be related to the age range of users on NextDoor). It turns out that adding real names only slightly raises the threshold on what some people will say.
IMO it is NextDoor's hyperlocality that makes it such a hot, angry mess. On Facebook, your cousin that's 5 states away's friend that's straight up posting swastikas unironically? The chances of meeting them, ever, is fairly close to zero if you don't want to, so it's far easier to walk away. But on NextDoor, it's because that the people are in some cases, literally next door to you, that the conversation is almost immediately emotionally threatening. It turns out your neighbors aren't like you, but not in a good way, and it challenges your sense of belonging to the local community. It turns out George down the street who's a sweet old man who's lived there for 20 years and has a pretty dog, is a neo-Nazi. Not something that would come up while exchanging pleasantries about the mail being late today, but thanks to the Internet, his eccentricities are on full display. Good fences make good neighbors.
It's easy to dismiss Nextdoor as a hot angry mess, but there are very sweet things too. My neighbor just recovered their cat, someone is collecting clothes to bring to a women's shelter, but the hot angry mess is very shouty, and impossible to tune out.
Pirating movies/software is not in the same moral category of stealing physical goods because it’s non-rivalrous. Yanking something off of Usenet is akin to watching a baseball game or a movie at a drive-in from over the fence.
To say nothing of the fact that you have a credit card and therefore real world identity (presumably) attached to that card and that any serious in-person griefing can easily end up with you tossed in a jail cell.
> I think most people were pleasantly surprised by how the "get into a stranger's car" and "go stay with a stranger for a few days" models worked out for AirBNB and Uber.
Most Americans for the former, maybe. "Get into a stranger's car" is literally an everyday routine in other places in the world, and has been for a long time.
> It's sad, but that's the state of affairs on the web as we have it today.
Hasn't this been the meta since ever since?
Imagining you'd know all about that with camarades. It was my first exposure to web streaming as a teen way back in the day. Initially, it didn't strike me as a porn platform, but it didn't take long before it felt that way.
At one point there was a movie shot at SAIL with rather open minded approaches to computer-human interaction. In the early 1970's (as in the twenties?), "hangups" were for inhibited squares. (If they'd been running unix instead of waits, they'd probably have invoked nohup(1) for the shoot.)
Yes, that is one of the reasons. Also, helping a number of other platforms to combat all kinds of forms of abuse. Unfortunately, as you correctly surmise, I have some relevant experience which some people find useful to tap into.
Camarades went 'downhill' about 6 months in when it went mainstream and more and more people that had other ideas of where we should go with it joined. Then they brought their buddies and it was 'game over', we had to decide what to do about it. This roughly coincided with the ad market collapse and drove the decision to put the porn behind a paywall to finance the rest of it. Which worked well for more than a decade.
Virtually all the ccurrent modes of online abuse were present in at least some form on Usenet and dialup networks by the late 1980s / early 1990s.
Content piracy to a much lesser degree as bitrates were so slow and expensive (though texts were fair game), but stalking, harassment, propaganda, bigotry, fraud, spam, etc., yes.
Keyword search of worldwide public content and metadata was not readily available, so the ability to stalk and harass both people and topics was vastly reduced compared to now.
(FidoNet existed, but it was not easy to perform a full-content search of it, compared to eg. mail-archive.com now.)
Kibo only grepped the newsgroups he subscribed to, and didn’t have a full UseNet pipe running all groups through grep (unless I misremember the history there).
The problem isn’t full-text search of a forum you’ve already signed up for.
Legend has it that if you include "Kibo" in your post to any of Usenet's more than 5,000 discussion groups, the all-seeing Kibo will respond to you.... [James "Kibo"] Parry spends about two hours per day as Kibo - tending a filter that collects any mention of "kibo" in any newsgroup.
There was an article on HN about Disney internet apps. Disney management consider "safe for children" to be a critical part of their brand. Their stance was very clear: no communication between users under any circumstances, ever.
Probably wise. I've seen online interactions in some games be limited to ready-made callouts or phrases you can assemble from pre-made words / sentences (like From Software games).
Of course, points for creativity if they can still make lewd comments like "time for thrust attacks but whole" near a corpse bent over a banister.
Minor quibble, but I think it's "hole" not "whole", so it's even closer. As in "Watch out for " "hole" "!".
The funny thing is that the game has incredibly sensitive username filter as well. I've fought countless players with names like "The #### K###ht" instead of "The ugly knight".
It did, as Swapnote on the 3ds. And originally you could send Swapnotes over the internet to your friends. But then it was common to exchange friend codes on message boards with lewds being traded (especially of the apps' mascot, Nikki) so Nintendo axed her and the ability to use anything but local note exchange in response.
I wonder how this policy has changed over the years. Disney used to have a virtual world called VMK that did allow for communication between users. This being Disney, there was a (highly restrictive) whitelist of words that could be used; this being the internet, users found lots of clever word combinations to evade the filters.
yes and i'm always looking at HN Entries which show the next file upload etc. and think 'puh holy shit they are fearless' because thats why i'm not building something like this.
I would create a company to have legal seperation for my private assets, i would use a ton of mechnism upfront to make sure i do my best to not support childporn and stuff and when i have analyzed how much work the proper way is, i will just stop thinking about it :)
Are there recommended services where I can point to...say...an s3 object and validate it doesnt fall into some /all of the above categories? Seems like a good business and also something i'd pay for to remove this risk.
Yes, look at Amazon Rekognition. I use it (very successfully) for a website that used to encounter similar issues with file upload abuse.
In my case, image uploads aren't mandatory for users, but they are very helpful in identifying the spammers (for some reason, the spammers almost always try to upload images that get filtered, so it makes it easier to spot them). That, combined with an IP check (getipintel.net) has almost completely eliminated the spam issues on my site.
I didn't know about this service and it sounds really cool. Kind of sucks that VPN = malicious since I use Mullvad for legitimate browsing purposes. That said, I understand that VPNs are often used a medium for illicit activities.
Was going to say generally the same thing. Back in the day it was anonymous FTP, if you enabled it in your server you didn't have to wait long and suddenly all the things would start showing up.
It reminded me a little bit of the guy that started a "buy a gift card with bitcoin" site, not realizing he was effectively creating a way to convert bitcoin into actual cash while avoiding an exchange. It was wildly successful until he realized that it was really really hard to get large quantities of bitcoin rapidly converted into cash.
It is a very coarse filter. Your ratio will go up but there will be plenty of people who feel that since they paid you they now really get to do as they please.
That would depend on the lifetime revenue per user gained spamming your system. There definitely is some sufficiently high signup fee, but it’s different for every spammer. (And for the child pornographers, there may be no fee you could charge to stop them.)
I believe it works for Metafilter, but you also run into problems with Credit Card abuse, fraudsters who create accounts and test which cards are not yet reported as having failed etc.
I predicted that about send the day it was launched. It helped that I had done a very close review of a large competitor in that space so seeing 'send' open up gave a lot of bad people a new playground with predictable results.
"Your repository" being the problem. It's not anonymous, and you don't necessarily want coworkers + world to know about your personal interpretation of Rule 34.
This is so true. We built Jumpshare (https://jumpshare.com) for file sharing and visual communication, so you can imagine the abuse we got. We had to spend a considerable time and resources to fight this abuse and adds checks and balances which slowed down our product roadmap.
Does something like email verification resolve this in some small way.
I ask because we are looking at extending our platform to include enterprise file sharing and rendering for niche file types, bit don't want to expose ourselves to the extra work you've described.
I've been maintaining a community forum for more than a decade. We had some abusive users, so we introduced pre-moderation. Meaning, any new user is on probation for a few posts, and all anonymous posts have to be manually approved by an admin to be posted publicly. This has pretty much completely stopped the visible abuse.
However, for about 10 years, there are bots registering every day, some of them even make realistic accounts with cool and unique username, email, even description. Strangely, the email and username never match... And they make post with html links embedded. They even know to actually select the 'HTML' content type, which is an extra select input. Some bots even make a few innocent posts, before the link spam. But just too few to get past probation. Obviously, the spam posts never get approved, and accounts never get out of pre-moderation queue and yet they're still trying every day... Not intelligent enough to make more than 2 dumb posts.
Similarly, I have some work projects that have user registration with a human on-boarding process, where another person has to add the user to their group for them to share any data, none of which is ever public. So these bots are tirelessly registering, and staying in limbo forever. Thousands of useless accounts.
It boggles the mind how much energy is wasted, but I guess it must my profitable enough.
It's amazing how some people can't seem to spot a bot/spam post, no matter how obvious.
One of the websites I host is for a retired elderly middle-aged academic who pens articles about his former field and invites comment.
Almost every week I'll get an email from him along the lines of "Is this worth following up?" and attached will be either a comment reading something like "I love your article. So much good info and useful to me.." or an email from some g3gergergew@gmail.com address saying "We love your site. It has much good infos and is useful but we are notice your CEO could being better..."
No matter how many times I tell him to look for the telltale signs...
* Random gibberish email from address
* Half a dozen links in the comment
* Broken English
* Generic text which could apply to ANY article or website in the bloody world
...he'll still forward them to me and ask if they're worth getting in contact with. Every time, I'm gobsmacked how he can fall for such inept spamming.
Ironically, there is another side to this too: you have to teach people that if they want a response, they need to include relevant information that relates to the person you are reaching out to, a "serious-looking" e-mail address, a proper greeting, good grammar...
Especially young professionals struggle to understand why any of this is important, and are left wondering why they are not getting a response.
The ineptitude of the spamming is actually a sign of eptitude -- they intentionally include the signs you look for because you're a waste of their time. They have figured out the precise line to walk where bad marks like you avoid them but easy marks like him engage.
I've heard that's the thinking behind the infamous and infamously 'reeking of scam' Nigerian Prince email scams too. And the fact that people still fall for them shows it's a seemingly a strategy worth pursuing.
[But a discussion here on the ridiculously obvious scam that people fall for all the time is probably too off-topic --even for HN!]
PS: Upvote for "eptitude". Even if it's not a real word, it ought to be. [And I'm not going to spoil things by looking it up]
I suppose that makes some sense; anyone who will fall for the most blatantly obvious scams will probably be pretty gullible, much more willing to hand over money.
Intentional or not, they have figured out a kind of clever selection bias.
Many people never look at email addresses (just like they don't look at full URLs - the programs have made it harder to see them by default).
Many - most - native English speaking people don't really know 'bad' English. People I've known my entire life who've grown up in the US and went to university still have trouble writing more than a few coherent sentences. Folks are bad at writing, and I think tend to not look critically at bad writing.
I think part of the reason he "falls" for it is the contents of the spam are things he wants to be true. In other cases he'd see the warning signs and know it's spam, but with things he actually wants to be true, he would rather someone he trusts to tell him there's no way.
Money in this case would be a tool to teach the client to value the person's time more highly and fend for themselves. If it costs the client $100 every time he forwards a comment for consideration, the client might actually put in the effort to learn the patterns the person has been explaining to the client.
If the client doesn't stop, at least the worker is compensated highly for their trouble, so it's a win/win.
This is an old person who would otherwise fall victim to a scammer, and all it takes a little bit of someones time every now and then. Suggesting we charge them $100 every time is ridiculous.
If you've only a handful of users doing this, the load's manageable. At scale, or for the ones who never learn, it eventually becomes a burden. Either you've got real costs to support or you need to incentivise learning on the part of clients. For a large cohort improved comms are net net a liability and risk with no upside.
Normally I would agree, but I don't know in this instance. Ignoring any personal relationships that the above commentor could have with the author, if the author goes to him first it keeps money away from the scammers.
> This has pretty much completely stopped the visible abuse.
This process works well but it can also turn away a ton of people from ever joining.
I know there's this one product that only offers support through public forums and new accounts need to be reviewed by a moderator and then you're not allowed to post any threads until you've made a few replies to other threads and it's been white listed by a moderator, and then on top of that and you also can't use links until you've met some other criteria.
But in order to get proper support it requires linking to large files (videos) that aren't able to be uploaded directly through the forums.
It becomes such a pain in the ass to open a support request. It could easily take over a week just to post your question and the worst part about it is the forum software orders posts by date, and your post will be buried on the 8th page before it's visible because it takes your original pre-moderated post date as when it was created.
That’s a very good point. I have experienced this myself, but I don’t see another way to stop abuse except tweaking the requirements. It sounds like too many hoops in your example. On our site if the user makes just one relevant comment I instantly promote them. It’s that easy to tell. I wish this deterrent wasn’t justified by walls of spam as the alternative.
There's one website I run where we were getting some spam through the "contact us" form. No problem, I'll just install captcha. I looked at the docs and realized it was more effort than I was willing to spend to kill a few spam messages a week. So, I just added an extra input field where the user has to type in "123". No spam since.
I thought it was kind of funny how easy it was to defeat the spammers, but on the other hand, there are so many easy targets that it's not worth their effort to try to overcome something so elementary just for our site. Conversely, there are much higher value targets where captcha is very valuable and worth the effort to implement because it is worth it for the abusers to try to defeat them.
> They even know to actually select the 'HTML' content type, which is an extra select input.
It's not surprising because spammers write scripts for all sorts of platforms (WordPress, vBulletin, etc.). There's probably no custom code written to attack your site.
They detect your platform and use a script from their pre-existing library to post their spam.
For realistic usernames, they can just reuse names they've seen on other platforms.
Goes the same for emails. Any existing email list could be a source for genuine sounding names. Just throw a couple of numbers at the end of the name and you've got a unique name that a human already came up with.
Captchas are the absolute spawn of the devil. Every time I see one on a website, I want to punch the face of the person who invented them, until my fists hurt. Unless I really NEED to use that site, I just go elsewhere, once I get a Captcha shoved in my face.
Max Levchin (the lesser-known co-founder of PayPal) invented CAPTCHAs. So you can pretty easily get a picture of his face to print onto a pillow to punch.
ReCaptcha v1 was quite a reasonable compromise until the spammers caught on and Google turned it into the annoying AI training and user tracking service it is now.
OK. Off the top of my head, some reasons I loathe Captchas:
1: You're working to train for free to train $megacorp's image recognition or OCR algorithms.
2: Even after [what seems like] eleventy billion Captchas and eleventy billion people pointing it out to them, they STILL don't tell you how to do the damned things properly. ie. If you ask me to click on any image with traffic lights, am I supposed to only click the lights?.. or the posts as well?.. what about a square that only has a wee part of one in? Does that count, or not?
3: As above with text based Captchas. Is it case sensitive? Do I have to include spaces between the words? What about the punctuation? Or the fragment of the previous or following word, peeping in at the side?
4: The Captchas never make allowances for the end user's language settings, so I'll often get American terms used, where the thing I'm meant to be identifying isn't called the same in real English. So I'm not 100% sure what I'm looking for.
5: If you have an ad-blocker, you'll usually be asked to solve about three Captchas in a row, thus cubing the annoyance.
Thank god for 'Buster' is all I'm going to say!
[I'm not going to link to it because the less people know about it the longer it'll keep working].
There are third-world sweatshops that solve capchas for a fraction of a penny. I imagine they provide an API, so your bot makes an http call and gets a solution back.
"Take a competitor product, remove all features you don’t need, and make it crazy fast."
Seems to me there are hundreds of lifestyle businesses just waiting to happen by following this formula.
So many good ideas out there could be made so much better by reducing them to their essentials, but making them elegant and "crazy fast".
I think you may have just re-discovered Disruptive Innovation (sometimes also called Disruption Theory): incumbents over-serve their customers by adding lots of features, complexity, and cost. Upstarts can attack them by focusing on only a few core features and/or low price. The incumbents can't respond without annoying their existing customers who have grown accustomed to all the features the incumbent provides.
YouTube seems to be playing the boiling frog experiment with people. I started YouTubing in 2006 and there were no ads. Then, eventually they added monetization and you could place a single ad at the beginning of your videos. Now I see video with 10x two ads interspersed. You have to constantly click to bypass them. It's getting incredibly annoying, and it just seems greedy, especially coming from Google.
That, combined with generally treating their content creators like they are completely disposable. I hope someday someone disrupts YouTube. It seems to me, besides the "network effect", the main difficulty here is unfortunately the cost of bandwidth. I could host a reddit clone from my home machine or some cheap VPS if I wanted and scale up to several thousand users, but video content at 5 megabits per second... How do you get bandwidth cheap enough to host that? Are there hosting providers that will just serve files over HTTP for super cheap?
Up front disclosure, I am the founder of a video hosting company.
I think you have very quickly found the reason for YouTube boiling the frog. Bandwidth, encoding, and storage all have costs associated with them. They are not simply free and so YouTube has to pay for these costs somehow.
There has been a mindset shift in the Internet at some point where originally people paid for premium services, and then it swung to everything for FREE. But nothing is actually free, what happened is a tradeoff in who pays from the end consumer/producer, you, to advertising companies.
So it's now up to all of us as users of the Internet whether we are happy with that deal. I personally am not, as allowing someone else to shape my thoughts in exchange for free services is not something I believe is beneficial to humanity. I'm doing my part, but it's up to each and everyone else to make their choice and do their's. Or don't if you are satisfied with the current deal.
On the storage front, it seems to me like you could get around that a bit by not keeping content forever. You could make a more ephemeral video hosting site.
As for paying for content, I think there's an opportunity to combine ideas from YouTube with Patreon. Tipping or supporting specific content creators you like. You move away from an ad-driven model towards a model where you get some basic content for free, but you can pay extra for more content.
These are good ideas. Our theory is that by simplifying the complexity of video hosting, it makes it viable for more people to test against various business models without having to do the heavy lifting of figuring out video encoding, delivery, and storage just to start. Basically similar to the role that WordPress, et al provided for websites themselves.
There is likely some new model that doesn't require fully ad-based or fully premium paid. We work with customers every day who are iterating on a variety of business models. I think in the end we will see a disaggregation of YouTube just like we have seen in many other once complex and costly spaces in tech.
Also manual moderation is pretty expensive, as is dealing with the copyright lawsuits and to top it off the patent licenses for streaming video in the first place.
Content, advertising, and bandwidth (along with other technical services) operate as entirely different types of economic goods, in multiple senses, but especially in terms of elasticities, rivalousness, appeal, and excludability. There's also the role of attention and content and service ranking.
Content is a public good in the economic sense: zero marginal cost, high fixed costs, nonrivalous, and poorly excludable.
Advertising is a rent (to advertisers) and an imposition (to its audience). There's an active aversion to it, the content is very often deceptive, manipulating, and against the recipients' true interests. At the same time, as more attractive audiences are driven away (and they will be) those who remain are subject to ever more, ever lower quality ads for ever more manipulative or harmful products and services.
Bandwidth and availability must meet or address peak demands which are infrequent though often predictable. Users' decision criteria, as with highway traffic, externalise most costs, whilst benefit is privatised, incentivising overuse. Provisioning must be based not on some average utilisation, but on probability of availability and service quality, with additional nines costing orders of magnitude more to provide. That risk environment may itself be quite variable.
The consequence is that demand, revenue, and cost components follow vastly dissimmilar dynamics, making the business exceedingly difficult on market terms, and incentivising numerous pathological behaviours.
Well, hosting videos is an expensive operation. You have 2 options to pay for the resources you're consuming: Trade your time (ads) or trade your money (subscribe). The latter is worth it if you're using YouTube a lot.
it's called uBlock Origins, available for all major browsers, try it, you'll be amazed. Haven't seen an ad on youtube since last decade (preemptive snark comment - current decade started on 1st Jan 2011 and will end on 31st Dec 2020).
I've been running ad blockers for... well, since they were created. I have no idea how anyone lives without them. But YT in particular has started bypassing all of them. I'm down to using just Firefox with the YT enhancer plugin, because it's the only thing that manages to skip the ads for me now. And I'll quit using it if that breaks.
I have personally not seen an ad on youtube in years, bar the video recommendations on the home page and the 'sponsored content' some channels bake into their videos. On desktop, I use uMatrix for more granular blocking + uBlock Origin for hiding empty frames, and I use Youtube Vanced on Android.
I'm using ubo on desktop and adguard on ios/android. No ads on YT ever, and very rarely I see ads on sites. Do you have your lists well-updated? What blocker did you use before?
One place I worked wanted multiple people to "own" a story, but JIRA doesn't work like that, so they implemented a totally new custom "owner" field that did allow it and then told everyone not to use the native owner field. Now you had to track everything two ways.
What makes this worse is when 1/3 of the teams in your org decide that they want a similar feature, but each comes up with their own custom-named tag ("owner", "lead", "point person", "point of contact", "jefe").
And then you want to run some jql queries against those tickets, and you have to use disgusting query generators to de-dupe the tag monstrosity.
They've done an excellent job of giving you just enough features to shoot yourself in the foot with.
How interesting, my experience is completely different - I'm a UX Designer working in a squad and I/we find JIRA super easy to use, allows you to customise ticket types pretty much to our heart's content (including removing stuff you don't need), ditto customising the board and other features, allows us to track changes, comment on tickets, and works reliably. Almost nothing else allows us to do the same. Admittedly Jira has been overhauled recently and is much better than it was, plus some new features have been added. Oh, and there's an app which allows me to do most of what I need to do on my phone super quickly, partially thanks to the notifications.
You're a UX designer. a User Experience designer. User Experience is your profession. And you're telling me: JIRA, the application with massively nested hierarchical layouts and a 2 second response time for every user interaction is easy to use?
Jira has to be the most complicated monolithic issue tracker available. I should think even Jira's sales team would struggle to call it easy to use, particularly when compared to more simple competitors like Trello.
I certainly have not made UX my specialty so I'd like to know what markers I'm missing out on here. What kind of things is Jira putting forward that mitigates its crazy abstractions, its hierarchical layouts, combined with its slow response times? I thought those were the kinds of things that signified really bad UX.
Personally. I'm forced to use Jira as part of my job and have suffered terribly at its hands. If someone is finding it a breeze then it would be a great benefit to me to understand.
I second this. I've been using Jira on and off for years and I'm still having the feeling I haven't fully figured it out yet. In particular, there are some things that I still find unintuitive and which require more brain capacity than they should.
I got so annoyed when someone that I shared JIRA admin credentials with changed the front page to a Kanban board. All I wanted was bloody issue tracking! KISS.
there is no work tracking tool better than .txt and asking people for updates.
Maybe I should sell notepad as a fully extensible work prioritisation tracking platform that integrates with all email providers and now supports slack...
I remember someone here created a service, it may have been paid, that simply emailed you "What did you work on today, what did you complete, encounter any problems?" at a pre-defined time every weekday and then would forward your response back to your manager. Fantastic idea IMO.
That sums up so perfectly what I did with a B2B SaaS product of mine. I never found words so perfectly as this quote does to describe what I was aiming to do.
Really cool product! stevoski, if you see this and don't mind me asking, how do you position/sell a product that does less (but is more streamlined) than your competitors?
When I've tried to cold-sell customers in the past, they often want that one feature (a random integration, feature matching from the service I'm replacing, etc.) and aren't willing to make the switch until I add it.
Maybe this wouldn't be a problem if I was marketing instead of selling?
Yep. The late 90's and early 2000's was littered with people trying to make "light" copies of MS Word. The problem is that journalists need the wordcount feature, and teachers need the wordart feature. Remove either, you lose a demographic.
That having been said, there are a lot of products out there that made their product intending it to be free, and then when they hit 1m users they started thinking "hmmm, if I could get a dollar out of every user, I could buy a house". They try to stuff a monetization model in sideways and damage their product in the process. Taking a moderately successful product that's crippled by attempting to shoehorn in monetization and redesigning it to have reasonable monetization from the beginning might be a better strategy.
That's exactly the point of this approach. Don't try to solve everybody's use cases like Word. Target one specific group and make the product faster and easier to use by removing all unused features.
Word is a special case and I don't think the model works there - not least because users need an industry standard for content interchange, and it's very hard to build a 100% compatible Word clone.
But there are a lot of opportunities elsewhere to make products that are faster, simpler, cheaper, and more useful than the current industry standards.
Word is not a special case, it's just people getting used to it and that's all. If tomorrow Microsoft goes belly up and their office suite will be dropped by everybody due to always discovered vulnerabilities Libre Office will pick-up quite nicely. I have yet to find a feature of Word that I can't find its equivalent in Libre.
But Libre is a massive product was well, with a tremendous number of work hours put into it over decades. We're talking about alternatives that are nimble and have fewer features.
That's sort of Google Docs/Sheets/Slides TBH (in addition to being hosted/shared). I'm not really a "power user" of any of these tools anymore. I use them a lot but I don't do anything fancy. That said, if Docs didn't have, say, a word count feature, that would be a major annoyance.
The moment your goal shifts from VC-backed acquisition or public offering bid to "I'd like to support myself and some employees comfortably" the things you can do profitably become almost endless.
Want to run a little webhosting business focused on small businesses? Tried and true. Want to offer custom software development for some little vertical? Easy enough. Want to just make a really nice and clean note app? That works too.
When your goal of success shifts from "mind boggingly rich" to "sufficient for the lifestyle I want", it's much easier to be successful.
Yup. If you build a communications platform, it will be used for spam. If you build a hosting platform, it will be used for porn. If you build a linking platform, it will be used to spam links for porn.
Anything else requires a constant uphill battle of content filtering and deletion. You could call it censorship, but it's a necessary reality.
Which, I expect, will be used for [unlawful] extreme porn in one direction and non-porn copyright infringement in the other direction.
Mind you all roads built get used for crime. There's a point at which mitigations for advise of their services become unreasonable to expect of a company.
I suspect that this is one reason you're seeing virtual conferences that are charging $50 or whatever. Yeah, it's a few dollars to offset costs. But it also keeps out the "riff-raff" so to speak.
Charging nominal fees are going to keep out the most casual users but I'm not sure it's a bad idea.
You can refund people who don't abuse your system too, either directly or in kind. That helps to keep things less about financial worth of users and more about behaviours.
True. I've also seen in-person volunteer events that charge some nominal amount that you can either get refunded or choose to donate if you show up. Even if it's only $10, there's a fairly large sunk cost effect that causes a lot more registered people to attend than if they only had to sign up for free. It can be a lot easier to plan that way.
I have my own hobby side-project that allows user-generated content. Do you have some tips for minimizing the ways it could be used by spammers/porn?
For example, my project is a PWA which (I assume) makes it harder for spammers to use because there are fewer direct links that could be used for spam.
What about using verified emails? Google's captcha (or similar)?
Shaddow banning is mostly good for individual people who are annoying other customers. They likely don't understand what shadow banning means and don't check for it.
Commercial spammers understand shadow banning and check for it.
The best option for commercial spammers is to make it expensive to post on your platform. That means shutting them down right away, so the effort to sign up doesn't get the reward of having an account available for days.
It's all a joke and it is censorship. On one hand FB is hiding behind freedom of speech but on the other hand I cannot send specific form of links even in private via messenger.
We were running a privacy focussed Chat-App-Network dating platform in 2017 which was accelerated by Facebook[1].
i.e. A network between, Messenger<->Viber<->Telegram<->Line App.
By design no media sharing was allowed(to prevent pornography) and the user profile images were received from the platform itself. But we soon faced unique challenge of people from certain countries using their children picture as profile picture(often just the children), there were people with group photo as profile image and then there were people who using explicit images as profile pictures.
So we integrated Amazon Rekognition to identify children, group and explicit images. Those using explicit images were banned immediately and those with children/group photo image(Face detection not Facial recognition) were asked to change their profile image(Their profile was not shown to anyone until they changed their profile picture with just them). We were processing >200,000 profile images per month as people change their profile images often.
But, as we very well know Amazon Rekognition or for the fact any such ML solution is not 100% accurate, we faced issues with people with darker skin color(Amazon told me that they were working to fix the issue; exactly why this type of half baked tech shouldn't be used for things which can cause harm) and so we had to reduce the confidence levels to such an extent that anything resembling a child would be flagged by the system(False positives are better in this case than false negatives).
> privacy focused Chat-App-Network... accelerated by Facebook
Why would any company focusing on privacy partner with Facebook or Google? I would guess that some ardent supporters of such a product/company would be put off by such a partnership, no?
Messenger had >1Billion users and so its users were 98% of our user base. We enabled them to communicate with users of other chat apps and vice versa. We didn't even use the 'Name' of the users.
I applied for their bootstrap phase under their FbStart program but they directly selected it for the Acceleration phase.
As for why I applied for Facebook if I care about Privacy?
I was a disabled, solopreneur from a village in India, without any kind of network strength competing with Valley behemoths and any kind of help is not just a force multiplier but life or death(But my product was selected meritoriously by FB). Facebook privacy issues(Cambridge Analytica) started only several months after I launched the product, so the image of Facebook was not like what's today. But it did bother me and just after a year of running the platform successfully, I had to close my startup due to my health issues[1]. I did not sell my platform to safe guard the privacy of the users.
Wow this is incredible for multiple reasons. I'm also working on a dating app and pondered this "security/content" issue, and I am also a victim of spinal damage. I had cervical myelopathy in 2018 and had to have an emergency fusion on C5-C7. I still experience various neurological issues for which they can't trace (they claim it's not related to the spine issues), but causes symptoms very similar to multiple sclerosis (although that's been ruled out). Your condition looks and sounds even more serious than that. I'm sorry you've gone through so much. I hope your health improves and I wish you success!
Thank you for your kind words. I can visualise what you had to go through. I share mutual respect and wishes for your recovery or should I say 'Management of our conditions'.
>I still experience various neurological issues for which they can't trace
The main issue I had (Tingling on the Face) has been successfully resolved after the surgery. Any other discomfort I had has been largely due to anxiety and post-traumatic stress from surgery, loosing my hard earned startup etc.
So targeted efforts in bringing down the anxiety helps me a lot[Staying in present, taking lesser sensory inputs(had dozens of phone calls earlier, now its zero phone calls, only email].
Side note: Does wearing hear rate monitor on the wrist, like Fitbit/Apple watch hurt you after 15-30 mins?[1].
I do not wear fitness trackers. That's a very strange issue. I see you outlined wearing it too tight or EM sensitivity, but there's a couple of other things: heat and sweat. Both of these things are known to play unkindly with nerve damage. For example, I get pins and needles when I sit in the sun. I get a burning nerve pain when I sweat from exercise.
Thanks for letting me know. Since the tingling starts within 15 - 30 mins even in cold conditions, I don't think sweat had a role to play (but I will keep this in mind).
The article I attached has highest number of visitors since I published, several people have also asked me the reason for it. Unfortunately, it often gets dismissed as wearing it too tight and I have definitely proven that it's not the case with me.
I'm planning a blind test with an elaborate setup to conclusively prove that heart rate monitor is hurting me and others; perhaps then we could proceed to find why.
Thank you, I think we incurred ~ $200 for ~200,000 images for Rekognition API and I also think Amazon had fairly large free-tier limit for Rekognition at that time, since the service was relatively new.
A market platform I recently worked on allowed users (free sign up) to create multiple wishlists and then send those wishlists to arbitrary email addresses. The user could set a custom title, limited to 100 characters or so.
We soon discovered a similar problem to OPs - bot accounts (mostly @qq.com addresses) were registering by the hundreds per day to create wishlists and then send those wishlists to other @qq.com addresses. They were setting the titles to arbitrary code blocks.
I found it fascinating, if terribly inefficient. Some colleagues and I were speculating on the purpose, perhaps someone experimenting some kind of laundered botnet control path?
We tried all kinds of measures to prevent it but ultimately we blocked all @qq.com accounts and eventually disabled the wishlist feature altogether as it had such little real usage.
We allow users to sign up for a free trial for our product, you have to put in your name & email address. After the trial expires, we send an email that says "Hey So-and-so, your trial ran out, click here to give us money, etc." Some enterprising spammers filled in the name field with spam URLs and the email field with victims' email addresses, in order to spam them. So the victim would get "Hey hxxp://buyfreerolex.com/, your trial ran out..." spam emails, from our email server. Obviously we've fixed it since, but it's absolutely wild the length spammers will go to.
Ha, thank you! That explains it... I've had several signups via Tor to my service, none of them confirmed, every few days... I guess they were checking if they can somehow abuse the mails.
We had somebody signing up to a website with Russian email addresses. What they did though was set the the Personal name to something like the following in Russian:
So when we sent out the email to verify the signup the receiver saw some English text they couldn't read and the above instructions in Russian telling them to click on the link.
This was a Magento site so I assume it was a standard bot.
We've had multiple spammer attacks over the years. Our platform allows users to create and publish their own content. Our primary target is education, teachers and students. But naturally it's being abused by spammers. It's been an interesting cat and mouse game to counter them.
- One time, they used our platform to publish links to their streaming websites for the quarter finals of the 2018 Champions League. Suddenly we ended up being first result on Google for "arsenal v barcelona". It was fifteen minutes before the game so you can imagine that we got a lot of traffic. On the one hand it was kind of flattering that the SEO ranking of our domain was so strong. On the other hand, it wasn't great nor beneficial for the platform to be abused like that. As a counter-measure, we decided to block indexing of project pages for 24 hours when they're first made public. The spammers never came back.
- Another time, we got an email from AWS that our SES bounce rate was 15%, and it was rising fast. Being blocked from sending emails by AWS would have been a disaster. It turned out that our invitation system was abused. A creator of a project can invite an external person by email. That person receives an email saying "John Doe invites you to collaborate on 'A nice project about the 2018 Champions League'" with a link to the project. Replace "A nice project about the 2018 Champions League" by a Chinese ad and you've got yourself spammers who are sending thousands of emails a second to a random collection of email addresses. Naturally a lot of these bounce, which caused AWS to warn us. So we had to start verifying the MX validity of invited email addresses and throttle the system to a maximum of 100 emails in a window of 24 hours.
- We still get a lot of spammers publishing obvious spammy projects. One thing that has helped is the Clearbit Risk API. You send them an email address and it comes back with an assessment of how spammy the address is. We use it for certain domains (protonmail.com, yandex.com,...) and it frequently flags someone as a spammer right after signup. They can still use the platform but can't make stuff public, completely defeating the purpose of them being spammy.
I'm sure they'll keep finding creative ways to get around the limitations we put in. The toughest is to find a way to counter them without it hampering the experience for all the other users.
I later discovered that Instagram banned all mylink.fyi links from the platform. A customer also confirmed to me via email that Snapchat started blocking links. Heh, I’m banned by Instagram and Snapchat!
[...]
If you’re interested in acquiring the domain name, and/or the app, let’s talk.
If you're one of those loud folks who dislike instagram/facebook (like me) then this is a nice way to ensure your content and data does not end up on the platform.
Of course, they're only enforcing it themselves, so it's unlikely to be permanent. :(
Paying for bouncers is an understood part of the cost model for a nightclub. If you provide any sort of real or virtual venue that allows unvetted participants, you have to factor in the cost of dealing with bad actors. If you have a virtual venue, you have to factor in the cost of dealing with automated bad actors.
It's just how human nature works. 90% of people are great, but 10% can do a lot of damage.
Here's an example of a similar problem - that I created for a website I help a friend with.
We had outbound referral links, basically to monitor the number of times that a visitor clicks out of the website. The URL pattern was something like: example.com/out.php?url=outbound.com
The out.php script would just simply (naively) redirect the user to the url specified. We never validated if the outbound link was to an authorized reference.
The result is the same. Eventually spammers figured out the above link and then just started posting their spam using the site's redirect script to any number of social media sites, embedded in email, etc.
What's interesting too is that we would see multiple redirects embedded into a single request. e.g. out.php?url=another.link.service/link=spammer.com
Obviously in hindsight this was stupid, but when we built it (some 10 years ago), the idea seemed pretty sound if not maybe a little naive. The solution would have been to only allow redirect links to authorized outbound sites, which works when those links are relatively static and not open ended.
There is an easier way to accidentally build a porn website:
1) register domain
2) don't renew
Not completely unrelated either, since it seems like OP's domain is now bust too...
What is also interesting is that probably half of what FB or <insert nonexistant competitor> does is moderation of sorts. This is why FB is becoming a commerce/community page website and why they need Instagram for social media.
I run a mediawiki site. I got to play the cat and mouse game as well. For a while I was getting hundreds of spam page creations per day despite implementing as many defenses as I could. I was at least once a day going through and deleting the spam posts using the Smite Spam plugin. It's finally calmed down and I get one or two posts maybe every couple of weeks. I think I may have finally been removed from the site list of whatever "packaged" software the spammers use.
With prebuilt software, a small custom change required for creating content is often times enough to deter mass automated spam. My go to for a number of the years was to add a form field with the label "Enter the word 'orange'". Trivial for a human but requires just a bit of customization for a bot, enough that most spammers won't bother.
I'm building a bespoke replacement for the wiki since all the pages share the same structure and the mediawiki solution isn't as user-friendly as I'd like.
Hopefully I won't do anything dumb like the naïve contact form I'd made that turned into a spam vector because I was putting unsanitized user input into mail headers.
I've come across these kinds of prototype link sharing sites in my battles with SEO spam.
On the face of it SEO spam is simple: the bad guys bung a load of links into your wiki or blog comments in the hope of gaining google rankings for their "SEO clients", but...
Increasingly they started bouncing links all over the place in complex spamdexing webs. Link sharing sites often get thrown in the mix, along with cross-linking other blogs/wikis where they've got their spam to stick. This all makes it easier to evade any filtering by a particular blog/wiki admin, but maybe also makes it harder for google to filter and down-rank the baddies, and finally depending on how complex a spamdexing web is, it offers protection to their clients because it ends up being impossible to see which end websites the spammer is actually aiming to promote (the links tucked away amidst the randomised cross-linked chaos)
But maybe that was a game of a few years ago, and now spam bots are mostly just trying to push porn on social media.
I'm facing the same issue with a URL analysis service I operate which lets you take a snapshot and screenshot of a website. I get countless submissions to other sites which allow user-generated content (Reddit, Medium, random support sites, everything that's "open") and all of these sites themselves host images and links to "Watch XX Soccer Game Live 2020 HD" etc., so all link-spam. I found my ways to battle these submissions and users, but obviously don't want to reveal them here.
The other thing is signup-spam. I'd say a full 50% of my signups are spam, and I do my best to remove those user accounts. What surprised me was that the spammers seem to be human, i.e. using a Gmail address (which requires verification), solving Captcha, entering form fields, clicking the email-verify link, etc. Just crazy. Again, the countermeasures are not something I would want to publicize...
I wonder how one can know the history of a domain name before buying it. Imagine making a new website and realizing that your domain name is banned from all social networks!
> Other solutions include: requiring credit cards for trial periods, ban all adult content from the platform… But they all require me to put extra effort in the project. And I don’t have time for that.
There is people who spend their entire life trying to build a working business, and those who just walk away from what would possibly be a pretty lucrative business because of lack of time.
Every time I see somebody identify moderation as censorship, I remember my experiences with public wikis, forums, etc. It's always spam and porn. They'll spin up more fake users than you'll ever have real users, every damned time. It's a vicious cycle, as real users won't stick around if you don't filter.
I struggled with it myself with Twicsy, except Twicsy was just a window into Twitter and made it much easier to find the porn and (gag) child porn. I remember the first time I found a network of child porn, it made me sick to my stomach. Tooks me hours to get rid of it, report it to Twitter, and to the authorities. It was a problem for a long time, I felt like I was doing all of Twitter's dirty work that they wouldn't do themselves. I made tools for people to report it easily, and tools to eliminate it en masse. Twitter took way way way way way too long cleaning up their act in this respect. They also often just deleted tweets and not pictures, pictures could linger for months without being deleted from their content servers, and still appear in Google.
IMO the better way to design a product like this to avoid abuse would be to simply force them to sign in using each platform they want to link to. At that point all you're building is a way for people to say, "These 5 social media accounts are all me".
You could allow them to select one of their accounts to source information and a photo for their combined profile. At that point you're not storing anything besides links to social media profile pages.
In effect you get to piggy-back on those sites' abuse mitigation strategies (though of course you're stuck with the lowest common denominator). Your biggest decision at that point is which social media platforms to allow onto your service.
Absolutely. Nicer if you disclose it in your terms of service but data inspection should be considered the norm unless you explicitly agree otherwise. And even then, if it isn't encrypted with a key that only you control then you should still assume your comms are being read.
I have no idea but it should be because that's basic troubleshooting. I once discovered a pedophile network by troubleshooting their server by looking at its traffic.
> You represent and warrant that: […] the Content will not or could not be reasonably considered to be obscene, inappropriate, defamatory, disparaging, indecent, seditious, offensive, pornographic, threatening, abusive, liable to incite racial hatred, discriminatory, blasphemous, in breach of confidence or in breach of privacy; […]
For which they obviously need to look at the data.
That doesn't imply they are looking at the data during regular operation. That just means that the user is liable for breach of contract if they post noncompliant links. The first point in time where the admin really has to look at the data would be during the discovery phase of the corresponding lawsuit between operator and customer.
Thinking about this when working on a social network (yes, I know) as a pet project at spare time, I've got a question: are there any tools for handling improper text/images automatically?
I think there are some lists of "bad words", but not sure if they are available for most widely used languages.
But are there libs/SDKs/online services that I can feed a picture and they will tag it as potentially improper, for example porn or some swastikas, so I can pre-moderate manually only such images?
Looks like it could be a nice and useful service.
Ow boy, my first real comment on HN, please be gentle!
Okay, so I have something related to this.
two paragraph back-story was cut...
So I built SASRip[1], an Open Source website with an API that allows you to download audio or video from any web-page that is supported by youtube-dl (I use youtube-dl and ffmpeg to do muxing/transcoding). I also built a browser addon for it called Media Reaper[2] (chromium version available on SASRip's website[3]).
Now, I wanted to build a no BS, no tracking website, so all I have is internal logs, these logs keep incoming requests like so: Time, URL, ID string, success/fail, the ID is just to tell where the request came from, web site, API call or the browser addon, I keep no IP data or anything like that, and boy do people download the nasty stuff, there is all kinds of nasty stuff, taboo stuff, feet stuff and stuff I didn't even know existed.
Now I live in constant fear that someday, looking trough those logs I will find CP and I am not sure what I should do, I know implementing tracking methods goes against both my morals and the philosophy of the service, but at the same time I am not sure if I can keep on going, knowing I could do something about it but I am not.
Ultimately I think it is very likely that I will shut the service down if I find CP on it, with no way of tracking the person down, perhaps leave a message with why I shut down.
----------------------------------------
P.S. I make no money on this service, it's purely donation based.
P.P.S I know I can do muxing and transcoding via some really cool JS libraries, but I wanted to sharpen some of other skills with this project.
I really like the design of the page - very clean and easy to understand the value prop. I could see this being useful for influencers and bloggers. Shame to hear about the mis-use :-(
I recall in the early days of mobile apps downloading an ipad app which had this neat idea that kids could share their drawings, made in the app, with each other in a sort of random way. It did not take long for me to realize that meant seeing an endless stream of inappropriate, or sometimes potentially harmful content (e.g. from adults interested in exploiting children).
I created a site intended for family photo/video sharing and it did not take long for people to start uploading penis pics. The weird thing is they don't even have a reason or recipient. They just want their penis out there on the internet in the hopes someone might stumble upon it.
I wanted a long time ago to make a free speech forum and platform. As much as I believe it would be good to have place you can say what you want, the hassle of spam and plain nastiness is just what always stopped me from doing this.
I'm building a platform that could be abused in similar ways. Does anyone have any suggested resources I could read/use to avoid this problem, preferably without employing captchas? Akismet might be helpful?
One thing to think of is not to be valuable to them.
So for instance, ALWAYS do HTML sanitization via whitelist; don't let anyone put any javascript or any weird CSS or HTML into your posted content that you don't allow there.
If you allow links, make sure they all have the "nofollow" tag, so that you can't be used for link farming. (Both because it helps spammers, and because if Google detects that your site is being used for link farming, your own sites bomb.)
Other tricks spammers use: Using a link for the username, since that is sometimes emailed or displayed even when moderated content is not.
The site I run is a special-purpose conference website for a relatively small community (usually 60-100 attendees), so manually moderating all content until a user is verified works pretty well. The first year we had a handful of spammers, but their content was all deleted by me before it was seen by anyone. (With the exception of the link for a username. Missed that trick.) The next year we didn't have any spam accounts at all.
No idea how well that will scale for your use case.
I will certainly be employing these methods, and doing manual verification at least initially- but if the project is successful, it will outrun my ability to manually verify everything.
Manual review (depends on the scale you intend to go for), build a 'trust' profile of your users, analyze the crap out of sign up page activity to detect abusers even before they've made their first move.
None of those require extra user interaction and can be quite effective.
Do you have any suggestions for further reading about approaches to this? I do plan to do manual verification initially, but if the project is succesful, I won't be able to keep up- a good problem to have, and I don't want to optimize prematurely- but I also don't want the site to become overcome with spam before I have any clue how to handle it :)
I don't understand what the problem is with using the service to provide links to porn, and also don't understand why Instagram and Snapchat would care about links to lists of links to porn?
I imagine it would have to do with
1) The number of minors on Instagram and Snapchat.
and
2) The image that Instagram and Snapchat want for their platforms. They aren’t looking to become tumblr in the eyes of the masses.
The whole app lives in Firebase. Once a user hits a profile page, a cloud function runs, get the data from the DB, constructs the page, and caches it at the CDN for 24h.
If a user add/delete something on their profile, the cache is removed. Otherwise, the content is served from cache only with no need for a lot of computation.
Even if rush hour has 10 times more traffic than off hours, that's still only 4 requests per second. Any unoptimized Postgres on a 2/2 node should be able to handle these loads just fine.
Of course, hits are not homogeneously distributed. But an average of less than one hit a second leaves more than enough leeway for any reasonable clustering you may come with. Any small computer can handle thousands of times more than that.
There is some questions about bandwidth costs, that can vary wildly.
For most bad actors, a small amount of work is all that's needed to block them from a new platform like this.
1. Use email verification / captchas
2. Block mailinator/throwaway account addresses
3. Use something like Cloudflare/Akamai to protect against egregious bad actors
4. Add some other level of social validation like a major OAuth provider
5. Simple pattern matching - as the author noted, these are rarely sophisticated, and once you put even a small barrier up, they usually trickle down. You can even go further and just limit the features your product has specifically to hurt their use case, or you can actively shadow ban the users yourself, so they don't know their links aren't working.
i think it's somewhat of a start for a business if you have the stomach for it. you just need to figure out how to make money off the freeriders and get around the social media censorship for the paid users.
Oh, yes. It assigns a small number of 'moderation points' to random users, and a larger number to trusted users. Essentially their algo picks out random users to moderate just a little, which lets the diaspora bump up visibility of what they like. In addition to up and downvotes the mods label a post "+1 Informative", "+1 Funny", or "-1 Flamebait". Users can assign more visibility to something which has been voted Informative twice than once Informative and once Funny. There is no "-1 I disagree". The purpose of the votes are obvious.
There is meta-moderation on top of this which has users check the moderation of other users. I assume people who repeatedly abuse their privilege are given sub-naive priority for mod point assignment.
This scales with traffic and requires the minimum of admin oversight.
Some can, Selenium or other browser automation is sometimes used in bots. Really, any headless browser can be paired up with some automated QA toolkit and turned into a rather effective bot. All that has to happen is for the bot to load the page in headless mode, then issue the keystroke events to the specified elements.
From my experience most don't as it costs more money (ressources). For me a hidden input with some math solved by Javascript got rid of 10 spam contact requests per day to actually 0.
However of course, the larger a page gets the more dedicated the bots get.
In my local jurisdiction it would be illegal for an employer to have any say in what you do outside work hours. They might try some restrictions via their employment contract, but in general it would have no legal effect whatsoever.
I've also worked at some start-ups that don't put this into their contract (or remove it when asked) so as not to scare the talent away.
I think you're right for the larger corporates though. It's often not enforced, but you run the risk of them claiming that it was "done on their time"-style issues later if its successful.
As long as you are not using company resources, including company time, or producing something that competes with the company (effectively using inside knowledge, another company resource), in most legal jurisdictions they can't stop you and any contract clause that tries is not legally enforceable.
Of course that doesn't mean many don't try block such things…
It depends a bit on the jurisdiction (I'm not really sure if laws or local culture came first, but they tend to go hand-in-hand for this issue).
E.g., in CA there are only a few whitelisted situations where your employer could own your side project, and in all other cases it belongs to you. As a matter of practice, most companies in the area allow side projects, and many "require" that you ask permission -- since they'll just rubber stamp approval onto whatever you're doing anyway, that typically works out in your favor; now you have written proof that your employer doesn't want to exercise any potential claims to your project even if it nominally seems to infringe on their business.
My employer would only complain if the side-project would reasonably make me less efficient within my contracted hours. So if I were doing 40h (5 full workdays here) every week and I'd be putting in 40h/w as well.
The other gotcha is that side-projects shouldn't be in any way competing directly with my employer. If you work in product-development that is pretty easy to isolate. Agencies on the other hand could basically make anything for a client. So that's a bit tricker.
No idea what of my contractual limitations are actually enforceable, though. And they've never complained about side-projects because my own projects aren't anywhere near something they would accept as a client.
As long as I'm not releasing something that is a direct competitor for my job, it's all fine. Main thing I learned is that being open about it at work takes away the fear and worries.
It doesn’t really matter if they accept it or not, you’re doing it on your own time. The US may be different, but here you employeer have zero say in what you do in your spare time. Don’t use company issued equipment or software licenses, that changes thing, and may count as misuse of company property and that can get you fired.
Most companies I’ve worked for have a provision that states they can buy my projects, by compensating me for my work.
No, it's a consequence of not being on top of abuse on your platform. And because those big social media platforms are working really hard to avoid going down through no particular fault of their own they shut stuff like this down at the first sight of trouble.
HN does the same with spam, which is the only reason we can have this conversation in the first place. I take it that you do not consider that censorship?
If big platforms weren't so obsessed with trying to censor porn, there would be no need for people to hijack other platforms simply so they could create links to share porn. There's this bizarre obsession with "protecting" people from nudity (especially female nipples). Advertisers will stay away from your platform if there's nudity. Credit card companies will refuse to process your payments (or charge much, much steeper fees). And for what? So people can pretend that naked bodies don't exist? That sex isn't a thing? Seems odd to me. If I want to private message my friends dirty pictures, who is FB (or whatever platform) to tell me I can't?
As for spam, it's a shame that after 20+ years of the stuff, we still don't have a good answer for it. Some subset of people is spending money on whatever spammers are advertising, or they wouldn't be doing it.
If someone wants to set up a service where all that is within the TOS they are free to do so, just as there are other people who have decided otherwise.
None of what you said is limited by the freedoms that you already have, including starting your own platform.
But I'm sure that when you do, you too will find some things on it that you will want to block.
Broadly I agree with you - especially regarding payment processors - but the problem is that this
> If I want to private message my friends dirty pictures, who is FB (or whatever platform) to tell me I can't?
VERY quickly becomes "unsolicited dick pics." That "links to share porn" quickly becomes "links to share child porn", etc. You're not accounting for all the genuinely abusive users.
Sure. But the genuinely abusive users are always a minority. Although on the scale of FB and the likes, that could still be a large absolute number. Perhaps if we, as a society, stopped obsessing so much about nudity, unsollicited dick pics might become a thing of the past. One can dream, I guess.
Why do social media sites treat "links to X" as directly posting X on their site? Surely the former is not against their terms of service? (Here am talking about legal adult content obviously)...
These overarching powers to censor anything for any reason is disturbing to say the least.
Which inevitably means an environment normal people hate - because they end up with some of the quirkier denizens of the internet on their global timelines if they use a public instance.
I suppose if most of the links posted for a site are spam the social media giants would rather just ban the site entirely than do "micro-management" for the site.
You're pointing out a problem there. The tools and processes around moderation (censorship might be too strong a word) are pretty lacking.
Let me consider with two cases.
For one the issue of a company making the rules, e.g. Facebook imposing prudery worldwide, even in places where things like a bare breast are not considered as such (yet?!). Why should someone half a world away physically and maybe even further morally have a say in what people can or cannot post in e.g. Europe?
Another is bans/ignores/mutes by users, I would say. The tools for these are absolutely lacking. Mutes or bans are usually permanent and there is no way to do anything about it. It's like solving every problem with a sledgehammer. (If I wanted to, I could go and mute or ban anyone I like to on e.g. Twitter and they would have absolutely no means to explain themselves, appeal the decision or have any expectation that their sentence is going to be over one day.)
Maybe social media should be rather treated like public infrastructure and should have to provide services too all while leaving any moderation problems to other entities (and should just have to execute their decisions instead of both deciding and executing).
- anything that allows anonymous file upload -> childporn + all of the above.
- anything that allows communications -> spam, harassment, bots
- anything that measures something -> destruction of that something (for instance, google, the links between pages)
- any platform where the creator did not think long and hard about how it might be abused -> all of the abuse that wasn't dealt with beforehand.
- anything that isn't secured -> all of the above.
Going through a risk analysis exercise and detecting the abuse potential of whatever you are trying to build prior to launching it can go a long way towards ensuring that doesn't happen. Reacting very swiftly to any 'off label' uses for what you've built and shutting down categorically any form of abuse and you might even keep it alive. React too slow and before you know it your real users are drowned out by the trash.
It's sad, but that's the state of affairs on the web as we have it today.