I have written this reply and started over 3 times -- hopefully this one sticks.
I really feel like the author boxed himself into a solution by faulty reasoning.
1) Cron jobs are not hard to set up. Being able to control these things on one or more server is just part of proper server deployment.
2) If it is not part of the application but seems to have to do with information about users who used the application in trial then maybe it should be part of the application? If you are deploying a application and you require to know information about users who trials are going to end I sure would bet that is part of the application and should be a trigger coded into the application itself.
3) I am shocked that you are already did not have a central spot for running time interval code and maintenance on your systems. At a minimum there should have been backups and reports generated on your mongodb instances. If these functions were already built into the application then then the same generic task system you used to run that is likely where you wanted to run this new functionality.
tl;dr Not knowing much about the application or its architecture I feel like the author painted themselves into a corner to justify using some AWS tools.
Having just written a bunch of scripts to populate a Google Sheet with some KPI data for the business folks, I _completely_ agree.
At first I wanted to put my scripts in Lambda, put an API Gateway in front, and have them be callable by the Google Sheet on a schedule. This seemed "pure" because I didn't really _need_ independent resources to do the task. The sheet could just update itself using an on-demand server!
Then I realized there was a 30-second max execution time. My scripts, in serial, couldn't hit that. I'd need to rewrite them all to be parallel. Also, they were a bunch of bash/jq/curl I'd just thrown together. I'd have to write a Python wrapper to shell out and handle streams (ugh).
At this point, I gave up and fell back to the advice of an (ex-Heroku) coworker: why not just run them in a cron job and put the data in a Heroku DB and use Heroku Dataclips to update the Google Sheet.
Sure I have to "have" an EC2 instance and a Heroku PG box, but they're both so small they're either free or essentially so. After cloning my script repos, everything worked right away.
Sometimes it's better just to do the "inelegant" thing and move on.
I believe it may have been a limitation of the API Gateway + Lambda thing. Just checked the docs again and it says there is an "Integration Timeout" as follows:
> 30 seconds for all integration types, including Lambda, Lambda proxy, HTTP, HTTP proxy, and AWS integrations.
So not Lambda alone, but in conjunction with the API Gateway.
Embarrassingly, I completed practically the exact same project but stuck it through with Lambda/API Gateway and just have it pull the precomputed data from a database. I wish I had known about Heroku Dataclips earlier! The build/test/deploy cycle for a Google Sheets add-on is a total nightmare.
On the other side of the coin, a few months ago I was annoyed by getting a redshift alert one too many times during maintenance that I spent a couple of hours on a Saturday writing a ~20 line Lambda project that took the notifications from Redshift and posted them in a Slack channel.
I didn't have to worry about setting up automation for launching an ec2 instance, monitoring it, paying for it, or any of the complex trouble for what ended up being a single python function that cost abotu $0.10 per month to run.
I don't really take issue with running this on AWS Lambda. Seems as good a place as any.
You're right that there are other straightforward options, but they do have downsides. Worst case is you have mismatch of workloads (e.g. a rogue cronjob affecting another service). Or you end up with resources that are largely idle.
That said, I agree with your unease on the application side. If you had a dozen (or many dozen) of these you've got pieces and parcels of logic all over the place. I think some architecture/style changes could be mitigations here. A shared library might be one.
> The cron job that we needed to write was unrelated to our application code, so whilst we could have put the functionality in there it seemed like the wrong place.
he never looked back at existing architecture and invented a challenge for himself.
I believe Amazon added scheduled execution because of high demand from customers who never understood why Lambdas are there. I saw also many people complaining about lacking SQS support of Lambda. It is totally normal that Lambda's are not supporting SQS because SQS is a pull model, while Lambda is designed differently. Very very probably, AWS will also add SQS support but it will mean that they preferred non-sense user complaints over their design.
Lambda functions are there to process a single message and produce output. It is a code piece that should be invoked responding to events (A new file on S3 bucket, a new record on DynamoDB, a HTTP request coming from API Gateway etc.) If you are designing a system where you execute Lambda function on its own without any meaninful event and get it pull data from an external source upon execution, you are designing your system wrong. Let us assume that this "cronjob" works every day at 10 pm and that day there was more than 5000 users whose trials end. Lambda has an execution limit of 5 minutes. In 5 minutes you are likely to fail to process 5000 users and Lambda execution will be interrupted. And what then? You are not designing a scalable system.
The correct approach would be to book a future Lambda execution (per user) when you register the user. A single Lambda function per user would then execute exactly when the trial ends for the user. Also this Lambda function would receive all the data it needs for its operation, so it would not need to connect to MongoDB to fetch user information. This can be probably done by SWF.
Unless you designed Lambda, this is an arrogant tone.
Sure, some of the things you mention are "better" or at least otherwise accomplished using event-triggered Lambdas. But there's plenty of reasons why you might want to run a job that takes under 5 minutes on a schedule.
For instance, daily or weekly "digest" emails. Or a nightly stats job, where expensive-to-calculate yet not-immediately-critical queries are performed and the results stored or exported. These are things where you want them to happen at a certain time. Cron does that.
It's possible that customers didn't understand why AWS thought Lambdas were there, but that's irrelevant. Suddenly, Lambdas were there, and people thought of more use cases (SQS triggering, connection to API Gateway, etc) than AWS had implemented from the start. That's not the customers' fault for being creative.
Yep, I agree. If you rely on Cron/SCheduled Tasks, then you end up needing to run a server to run them. If Lambda did scheduled tasks, I'd have 1-2 boxes less in each environment ...
Not sure based on your wording, but you -can- execute scheduled tasks with Lambda. It's a little tricky, as the actual configuration takes place in Cloudwatch (Cloudwatch Events). You also will be sure to want to set up a dead letter queue, and have that broadcast to an SNS topic that will email you, so that in the event the lambda fails, you'll be alerted.
You can just set up "here's my cron string, run this function", and it will run that function when applicable. No external services, no SQS. Just a Cloudwatch event, that executes your lambda, at a set schedule.
You can still have regular boxes but turn on them when you need and turn off when your job is over. Let's say you can turn on an instance everyday at 10pm and tell your job to turn off the instance which it runs on when the job ends. It is possible to do it via AWS API.
Lambda was designed to be functions that run in response to an event. An item being put on an SQS queue is an event. The clock ticking over to the next second is an event. There is no reason why a lambda should not be triggered by these events.
Right but SQS is a pull model so it doesn't generate events. You can use SNS events to trigger a Lambda and tell it that something was added to an SQS queue or S3 or Dynamo or whatever.
Perhaps this is a naming issue? I wonder if they had called Lambda "AWS Triggers", they would have had customers operating that way. Still, this won't be the last time that a system adjusts to meet customer requirements.
I'm confused. Is this a microservice as a cron replacement or is this just running a script with AWS lambda? If it's the latter, it's a bit of a stretch to call this a microservice, or a cron job for that matter.
It seems like it might be a nice post about using serverless, but the whole cron + microservice bit is muddying the water for me :/
Sure, cron is easy to use, but it also involves maintaining a sever to some degree. I use a Lambda "cronjob" to run health checks on various other servers/services- I don't have to worry about healthchecking my healthchecking machine.
Cron is hard. I think it's better to just build in good locks and assume it will fail occasionally, and on failure repeat.
This approach works on a small scale, but it seems like the logic should be at the application layer. Secrets, API keys, ACL, etc all need to be duplicated on Lambda. Developers can't easily run it locally. Testing (eg integration testing on database migrations) is separate from the core app. Build promotion / rollbacks are different than the core application. Error tracking / logging may be different. Monitoring will be separate.
Seems like a cool demo, but I think that running the code will have more overhead than writing it.
so, i'm not a very good engineer so maybe this is a stupid question... but i am going to ask it anyways.
Cron is hard. Not in the sense that its challenging to set up but its a challenge to maintain and have visibility into. It seems like you need a whole set of infrastructure just to feel confident that things are running as intended.
This has always lead me down the path of building a task manager type of micro-service running on an independent server to manage everything. But by the time im done, it feels like over kill. I just don't know what to do differently. Am I missing something?
Sounds like you might be better off using Jenkins than roll your own. I use Jenkins over cron when I want easier auditing & authentication. I got the idea to use Jenkins as a 1-size fits all hammer from https://www.cloudbees.com/blog/drop-cron-use-jenkins-instead...
Great to see I'm not the only one using Jenkins to look after crown jobs!
However, it's to me way more than simply auditing & auth.
The main advantage I found with Jenkins are:
- you get a central place for all the cron jobs. That comes with all the advantages the most important one being nobody has to remember on which server is this freaking script
- making an update can easily be done
- running the task can even be done by non technical people.
- easy backup
- when a script become deprecated you can easily remove it and you're not very likely to let it there running for nothing forever
- easy documentation for you tasks
There's probably more but Jenkins is definitely awesome for cron jobs
I've done this many times in the past. Add an authenticated "cron" endpoint, and have a jenkins job curl it every X seconds. It's not perfect, but it's an easy way to see log, alert Pagerduty, etc.
I've been burned by bad cron monitoring many times in the past. Short answer: check out https://cronitor.io/. Basically, finish every cron job by telling their server that the job finished. It's good enough monitoring for most cases.
Something like http://www.easybatch.org/ may provide quite a bit of the infrastructure you're looking for. I'm sure there are equivalents for other languages.
It specializes in plugging Chalice into all the various event sources that can trigger Lambda, including the AWS Events API's "cron" functionality highlighted here.
Was actually just working on a Lambda function myself when I read this, though for something a good bit more complex.
My reaction to this particular case is more like - yeah, I guess you could do that, but it seems a little odd. I could see it making sense depending on what your infrastructure is like. Might be handy if you're heavily into AWS and need to coordinate AWS actions with appropriate IAM roles and VPC access and such things. The auto Cloudwatch logging can be handy if you don't have your own logging set up already. But on the other hand, who's running something like a business that doesn't have better logging already set up, and better infrastructure for running infrequent tasks, or at least a few servers lying around somewhere that can run some modest Cron jobs in addition to whatever they're already doing?
I have been thinking about how to approach this in several projects, and while scheduled lambda calls do the job, besides being AWS specific, do not integrate well with the "old way": launching shell commands.
The beauty of shell commands to me was that you can launch them both with cron and manually. Arguably you could do the same with lambda, but here we go, another service to set up (api gateway).
I built http://croningen.io - which is a hosted version of cron that schedule jobs on clusters of servers, with central error reporting. It is in my opinion as easy to use as cron, with most annoyances removed. Early days, but feedback welcome!
> Cron jobs are easy to write, but difficult to setup
Please explain to me, how is cron hard to set up?
(slightly of topic: if you didn't learn cron don't worry. It is just a matter of time before it is replaced with some systemd service that only works half the time)
I'm not the author and didn't write that line, but I can speculate about the kinds of concerns he's imagining, looking at the situation from a DevOps/automation-focused point of view:
How do you set up cron in such a way that your cron job runs on a machine somewhere, and will continue to do so for a long period of time?
Sure, you can log into a machine and edit the crontab manually, but what will happen if that machine fails? Ka-blam, it suffered a hardware failure and is gone. Do you repeat your manual edits on another machine? (If someone else did this three years ago and is no longer with the organization, does anyone know what edits to reproduce?)
OK so we need to build a mechanism that can ensure that at least one machine is running, and has this crontab installed on it. If the currently active machine fails, it needs to replace the machine and reinstall the crontab and software that the crontab runs. You need to monitor to detect when this happens, to kick that off, and you need to test the stack to be adequately sure it's going to work when it really does happen. There's infrastructure you can use to do that, of course, but it's all complexity to be mastered.
With the approach the author's describing, one only needs to define configuration: what code to run on what schedule. It's declarative, and the infrastructure handles actually executing it on a machine somewhere, completely encapsulating the concerns related to "getting some hardware running" and "installing the code".
Cron itself is a system service and is not something I'd hope you would need to run under supervisord.
Some of the challenges that I mentioned, though, are not just arranging the configuration that you want to have on the machine (whether cron or supervisord configuration), but getting the configuration onto the machine and ensuring it stays up and working long term. When looking at the problem with a long time horizon and larger system context, one expects any machine to fail and need to be replaced, so handling that is important. With this perspective, it doesn't matter how reliable your process supervisor is: the code can be written perfectly and the application will still fail to do its job if the machine it's running on halts! I don't want to have to care if my machines suffer a hardware failure; I just want a replacement to be brought online seamlessly.
Sometimes in cases like this, one needs a whole lot of infrastructure to solve what feels like it ought to be a simple problem. In larger system contexts it can be quite inconvenient to invoke the full complexity needed for reliable servers just to run a job periodically.
Once you start to tie the satisfaction of any business requirement to the successful completion of a cron job, lots of problems crop up. Even after you set that configuration up, there are still tricky issues to consider like: what if the host running the cron job fails at just the wrong time, like at 11:59 when the job starts at 12:00? Is it possible the job might not run at all today even if the host is replaced promptly? Is that OK, and if not what do we do about it?
If you have many different cron jobs with slightly different security contexts and permission, then do you launch a separate machine for each one? Do you set them up to share the same server? How will you know if, a few years from now, that server doesn't have enough power to run all the jobs fast enough? Will you notice if it's gradually slowing down? These are some of the issues I'd consider if I was tackling a business problem that required doing something on a regular schedule.
Serverless-style infrastructure takes away a lot of these concerns. There's nothing to host or maintain yourself; no servers needed at all: just the definition of the code you want to run, how frequently you want to run it, and what resources (if any) it needs access to. A serverless task that runs once today will continue to run indefinitely, and it doesn't cost more to set up than the naive crontab approach.
"I'm too elite to use cron and bash scripts. If it doesn't require npm to install and an AWS account, I don't bother."
Seriously, it takes all of 10 minutes of reading the cron and related man pages to figure this out. (and your Lambda code needed the exact same syntax for cron! )
And you already have a MongoDB server running, so it is not really any additional server config time to do it.
Yeah, I find that hilarious. Cron is probably over of the simplest unix-y things you will ever encounter.
I am also getting tired of this "npm all things" mentality. I don't want to install npm and a ton of npm packages to do something that has nothing to do with web development and java script.
This seems fine for a single job but if you have dozens to manage it would become unwieldy. I'm excited to see the ins and outs of AWS Glue[0]. Hopefully that will make projects like this more manageable.
I really feel like the author boxed himself into a solution by faulty reasoning.
1) Cron jobs are not hard to set up. Being able to control these things on one or more server is just part of proper server deployment.
2) If it is not part of the application but seems to have to do with information about users who used the application in trial then maybe it should be part of the application? If you are deploying a application and you require to know information about users who trials are going to end I sure would bet that is part of the application and should be a trigger coded into the application itself.
3) I am shocked that you are already did not have a central spot for running time interval code and maintenance on your systems. At a minimum there should have been backups and reports generated on your mongodb instances. If these functions were already built into the application then then the same generic task system you used to run that is likely where you wanted to run this new functionality.
tl;dr Not knowing much about the application or its architecture I feel like the author painted themselves into a corner to justify using some AWS tools.