Grabbing Page Titles with F#
Through experience, my favourite language for the .NET Framework is probably F#. True, it doesn’t fit into every situation; I don’t fancy writing a website with it for example. But for scripts and utilities, there’s nothing better.
You can run things interactively using F# Interactive or you can compile something up and use it as a normal executable. It’s super-fast to prototype with, which is great when you’re just throwing ideas around the office like we often do at Branded3.
I thought I’d start simple and share a little script which I’ve used to grab the titles from websites and output the results to the screen:
open System open System.IO open System.Net open System.Web open System.Text.RegularExpressions let http (url:string) = try let req = WebRequest.Create(url) use resp = req.GetResponse() use stream = resp.GetResponseStream() use reader = new StreamReader(stream) let html = reader.ReadToEnd() html with | :? UriFormatException -> String.Empty | :? WebException -> String.Empty let title (html:string) = let r = new Regex("(<title[>])(.*){1}(<\/title>)") let m = r.Matches(html) |> Seq.cast |> Seq.map (fun(m:Match) -> m.Groups.[2].Value) match m with | _ when Seq.isEmpty(m) -> String.Empty | _ -> Seq.head(m) let websites = [ "http://www.branded3.com/"; "http://www.twitition.com/"; ] websites |> List.iter (fun(u) -> printfn "%s" (title (http u)))
Now, you can’t tell me that doesn’t look pretty? You can see that we’ve got a function called http; a function called title; and a list of strings called websites. Both functions and values are set using the let keyword, and subsequent lines are indented to show where they belong. No extra curly braces here!
The last line is my favourite part. The websites value is piped through to List.iter which will iterate through each of the URLs in the list and run the supplied function on them. In this case, that supplied function is an anonymous function which takes in the URL and prints out the title.
The final output is a printed list of titles:
Branded3 is a leading SEO, Web Design & Development Agency twitition - sign petitions using twitter
Naturally you can expand this by adding functions for using text files to get the URLs or write the output. Or even new functions that take the URL and get the HTTP status code or website structure, then output each line to a CSV file. But I’ll save those functions for future posts…
Comments
Latest from B3Labs
- Another milestone reached for Branded3 as it’s acquired by the
St Ives Group - The latest media consumer findings & what they mean for digital marketers
- Talk to Branded3 at @BuyYorkshire in Leeds next week!
Latest from Blogstorm
- Why your press releases are getting you penalised
- After five years, Google still doesn’t know how to rank images
- Tickets now on sale for the next #B3Seminar in London – book now!
