Slow SMTP connections to Google memory overload ( Controlling Exim timeouts)

Posted on

Problem :

Our organization has several servers that generate and send out lots of emails (we’re not spammers, it’s legit email to users who have asked for it). Exim on these machines is configured to only try to deliver mail once and if it fails, hand off the email to an overflow hub server that will keep trying. I just added a new server to this herd, and have added it to our SPF record and configured it exactly like our other servers. However, we must be on a black list or more likely grey list with Google, likely because the netblock of this new server has a bad rep, as emails to gmail accounts always fail with a disconnect error on the first try. I’ve tried by hand sending to gmail addresses and if allowed to make more than one try, this works fine, I can see the first try failing and then the second try works, no problem. But, as i said, we’re not doing second tries with our system.

The problem is that I end up with a bunch of Exim processes that are sitting around waiting for Google to disconnect, i guess, which ends up maxing out the memory on the machine (it’s a virtual machine, so only 512M of RAM) and then the kernel’s OOM_killer starts killing processes and we lose emails that are supposed to go out.

I don’t expect to get off of Google’s greylist any time soon, so the real solution I can see is to figure out how to get the Exim processes to give up sooner so I don’t run out of memory. Is there a way in the Exim config to do that? Or to simply limit how many total Exim processes run on the system? Note that these are separate Exim procs, fired off by our custom mail generation software, they’re not children of the main Exim MTA. I don’t see anything in the Exim docs that would seem to address either of these.

note: I’ve already tried using hubbed_hosts to make Exim not even try for google destinations, but this isn’t working. Maybe I have the formatting of my hubbed_hosts file wrong?
This is it (with domain name obfuscated):

.*google.com:  bulkflow.mydomain.org
.*gmail.com:  bulkflow.mydomain.org

If I have this wrong and i can get it fixed maybe that’s the answer.

Solution :

Remove the .* from your hubbed_hosts file, plain gmail.com etc. is correct.

Furthermore check the excellent documentation and look for items such as queue_run_max, queue_load_max, queue_only and smtp_receive_timeout.

Note that exim is very good in correctly handling messages, even if a proces that is handling a message is killed the message will remain in the queue, so losing messages should not be a concern.

I have used exim to deliver hunderds of thousands of emails per day and exim offers plenty of tuning paramters to let things work smoothly.

Leave a Reply

Your email address will not be published. Required fields are marked *