Although calling this a PHP issue is kinda moot.
I was hired to do up a quick import script to pull third party information into a database. I was given some test data, access to the server I'm importing to, and a good luck message. In short order, I managed to figure out the schema, and get a PHP script working that could import to it properly. This information updates every day, some of the records (one per line, tab-delimited) update, someare removed, and some new ones are added. Because the third party simply publishes all of its information in one large file (no diffs or anything that would make my life easy), I was asked to simply delete all information from this third party from the database before importing the new data, and it would always be accurate. Not my ideal way of operating, but fine. Their money spends as well as anyone else's.
Well, today I finally got to deploy the script with real data. I received information on how to automatically pull the data from the third party's FTP server. I went and FTPd manually to their server, grabbed the data file, and took a look at it locally. It was a 75MB zip file. It expanded out to a 760MB text file.
The fact that I couldn't even open it in any of the text editors I had handy should've been clue number one. Well, actually, it was, but I should've trusted myself more. Finally, thanks to Linux and the less command, I was able to take a look, confirm the data looks the way I expected it to, then I tried running the script.
30 minutes later, I called to have the process killed.
For grins, and perhaps because I should've done this to begin with, I did a wc -l on the file, to count the lines. 908,556. One of those is a header, so you can remove one. In 30 minutes, we managed to import about 41,200. And that was without the delete needing to be run first. This clearly has not been my day.
The server is a dedicated managed server of unknown spec, running MySQL 5, PHP 5, and Linux of unknown flavor. I'm not the world's greatest PHP coder or admin, but I have a feeling it will never be able to handle the kind of load we're asking, no matter what I do. And this is where I turn to you, the ever-so-smarter-than-me WTF crowd, to see if there is anything I can do to make this work.
Obviously, simply deleting every entry and recreating it is out (to be fair to myself, I reeled from this suggestion in the first place, but as I said, it's their money). It's going to be hard enough to finally get that first import done. After that, if I can't get diffs from the third party, I'll have to try to make my own, which will take up a lot of disk space, and possibly still not work.
In short, any ideas? Is this doomed to failure? Thank you for your help, and for letting me post this story for my first ever post.