(Also see the follow-up post about some progress)
Today I was (again) facing a log file from a machine that had for some reason not been able to start a temporary MySQL daemon during the night to prepare for a streaming MySQL slave installation. The necessary 2nd daemon had created its new ibdata files, however just after that aborted the startup process with the following message:
Can't start server: Bind on TCP/IP port: No such file or directory 071001 23:09:55 [ERROR] Do you already have another mysqld server running on port: 3310 ? 071001 23:09:55 [ERROR] Aborting 071001 23:09:55 [Note] mysql\bin\mysqld.exe: Shutdown complete
As you can see, the port is a different one from the default MySQL port, so I can be sure there was no conflict with the primary instance. Even more curiously the same process has been working on that and other machines flawlessly for some time. However I remember having seen this message once before, but back then I did not have the time to look into it any further. We just restarted the streaming slave setup process and it went right through.
This time however restarting that process didn't work. It just aborted with the same message again. I especially wondered about the error message: "Bind on TCP/IP port: No such file or directory". What the hell is that supposed to mean? A colleague and I even had a look at the MySQL source code, but the "No such file or directory" message is not to be found in conjunction with the bind error message. I looked on the web but could not find any explanation where that comes from.
However because the process would fail repeatedly I had a look at port 3310:
netstat -an | findstr "3306" TCP 10.123.234.12:3310 10.123.239.11:1433 ESTABLISHED
You can tell this is a Windows machine. Use findstr is a poor man's grep for Windows. However what's more interesting is that there is a connection on 3310. However this is not due to someone having already bound that port for some sort of a service, but instead it is in use as a local endpoint for a connection to a SQL server instance (port 1433)!
It turned out to be the same problem we had some time ago with JBoss not being able to bind port 1099 on Windows 2003. We had not explicitly reserved port 3310 for private use by a user-controlled application, so Windows was free to assign it as a local endpoint to any process requesting a socket connection. This is probably due to the range of so called ephemeral ports. The name stands for a range of ports that make up the "local" end every time an application connects to a remote service. The default bounds for this range are from port 1024 up to 4999. As 3310 is right in the middle of this range, apparently chances are big enough to get bitten by this more than once in a lifetime. We have now added port 3310 (and 3306 for that matter, which was also missing) to the list of ports excluded from the dynamic assignments. Probably 3306 has never been a problem, because the primary MySQL instance is configured as an autostart service. I guess it starts early enough during boot so that the chances of anything else having claimed that port are very low. However there was most probably some luck involved, too...
This might also be a problem if you need a lot of outgoing connections on a machine. To configure the range for Windows 2003 Server, have a look at Microsoft's documentation on TCP/IP stack implementation details, specifically the section on the "TCP TIME-WAIT delay" in the Core Protocol Stack Components and the TDI Interface chapter (for Windows 2000 see this page). They are both linked from Knowledge Base article #908472
We will monitor the situation and see if we get any more troubles with this.
I still however do not get the "No such file or directory" part of the message...
To round things up here's another link on the topic of ephemeral ports and their meaning for network security: www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf.