Pimping System.Net
As most of my readers know, I'm a huge Phil Hendrie fan. I've actually personally thanked Herb Sewell and Walter Bellhaven in the acknowledgements of most of my books. Now you might be thinking "WTF, they are both criminal psychopaths, and violent ones at that." Well, Herb's been reformed and Walter agreed to give me some botany lessons so I'm ok with that.
Anyway, Phil has been adding content to his site from years back. And there's a lot of Herb and Walter bits. Each hour of his show is a separate mp3 file. And there's a few years worth of them now. I started manually downloading them but it sucked. Then I got clever and decided to hack my registry so I could download more than 2 at a time. Same problem though, I still have to pull the stuff down manually - which sucked.
I decided to automate the process to make life easier. Problem is that while the file naming conventions are consistent, you can't count on files being there for any given day. I learned this the hard way. Another problem is that the URI you see on the page for downloading isn't where the file was at.
So I followed KC approach that he used to beat the MSMVPS' captcha control - and just sniffed it. Found the url it was coming from and I was off to the races.
Pass one was using HttpWebRequest and HttpWebResponse, passing in credentials (it's a pay site) and all sorts of other funky stuff. It was lame and complex but it worked. Then when i was screwing around, I found the WebClient.DownloadFileAsync . Wow this made it easier. Now, I have a pretty complex file structure and a NAS device which stores my media files (I've got hundreds of shows for instance, and most run around 20-30 megs) so I had a lot of detecting to do. But before you can run you have to walk, so that's what I got working first. I replaced about 300 lines of code with this (most of the exception and logging code is stripped out b/c it would make it unreadable). I left one exception block in there. As much as I rant about the evils of lame exception handling, I actually did some of it intentionally. Since the days weren't consistent, and number of days, weekends, repeats etc varied each month, I first decided to just use a 30 day loop, try for each file, and if i got it great, if not, it'd throw an exception, I'd skip it and move on. (how's that for a run on sentence?). Each day there was a show, there were 3 hours - 1.mp3, 2.mp3 , 3.mp3. So I have an inner loop that handles each hour. I had an outer loop to handle the days. (I got sick of tweaking the file strings, but those need to be changed b/c the URL and saveAsFile both have redundant text and should be formatted - I know using Format for 1/2 the stuff is lame. It's late, I'm tired but you get the idea)
So the first pass came out looking roughly like this:
for (Int32 x = startingIndex; x < lastDayOfPeriod; x++)
{
for (Int32 i = startingIndex; i <= numberOfHours; i++)
{
String url = String.Format("http://www.philhendrieshow.com/download/2000/02/Phil Hendrie - Feb 0{0} 2000 - Hour {1}.mp3", x.ToString(), i.ToString());
Uri uri = new Uri(url);
String saveAsFile = String.Format(tbDownloadDirectory.Text + @"\" + "Phil Hendrie - Feb 0{0} 2000 - Hour {1}.mp3", x.ToString(), i.ToString());
System.Net.WebClient Client = new WebClient();
try
{
Client.DownloadFileAsync(uri,saveAsFile);
}
catch (WebException ex)
{
Debug.Assert(false, ex.ToString());
}
}
Ignoring some of the pieces I used to make it a little more robust, this approach just sucked. I had to change dates for each month, and made a bunch of requests that weren't going to be completed. Moreoever, when a file wasn't there, you'd get the text from the Yellow Screen of death. So the core logic of using an exception to get around it wouldn't work. You'd end up with a bunch of .mp3 files that weren't really mp3s but you didn't necessarily catch that up front. So I removed the exception block and did a check for file size, deleting the file if it was less than the size of bytes returned when a yellow screen of death was encountered. This left me with this...
Client.DownloadFileAsync(uri,saveAsFile);
FileInfo fi = new FileInfo(saveAsFile);
if (fi.Length < errorPageByteSize)
{
PublishFailure(string.Format("Unable to process : {0}", saveAsFile));
fi.Delete(saveAsFile); //Removed exception handling for this
PublishInfo(String.Format("Deleted : {0}", saveAsFile));
}
else
{
PublishInfo(String.Format("Successfully downloaded : {0}", saveAsFile));
}
This got the job done, but it was still weak. That's where Regular Expressions and Generics came in to save the day. I'm still tweaking it, but I'll try to post the result tomorrow. Needless to say, I cut the code down so much that well, I feel like a dumb a55 for writing it the long way in the first place. I feel like an even bigger dumb a33 for downloading as many files as I did manually. Hopefully when i wake up tomorrow, it'll all be done.