By 3 years ago in Development How to...

Grabbing Page Titles with F#

Through experience, my favourite language for the .NET Framework is probably F#. True, it doesn’t fit into every situation; I don’t fancy writing a website with it for example. But for scripts and utilities, there’s nothing better.

You can run things interactively using F# Interactive or you can compile something up and use it as a normal executable. It’s super-fast to prototype with, which is great when you’re just throwing ideas around the office like we often do at Branded3.

I thought I’d start simple and share a little script which I’ve used to grab the titles from websites and output the results to the screen:

open System
open System.IO
open System.Net
open System.Web
open System.Text.RegularExpressions

let http (url:string) =
   try
        let req    = WebRequest.Create(url)
        use resp   = req.GetResponse()
        use stream = resp.GetResponseStream()
        use reader = new StreamReader(stream)
        let html   = reader.ReadToEnd()
        html
    with
        | :?  UriFormatException -> String.Empty
        | :?  WebException       -> String.Empty

let title (html:string) =
    let r = new Regex("(<title[>])(.*){1}(</title>)")
    let m = r.Matches(html)
                |> Seq.cast
                |> Seq.map (fun(m:Match) -> m.Groups.[2].Value)
    match m with
        | _ when Seq.isEmpty(m) -> String.Empty
        | _                     -> Seq.head(m)

let websites = [ "http://www.branded3.com/";
                 "http://www.twitition.com/"; ]

websites |> List.iter (fun(u) -> printfn "%s" (title (http u)))

Now, you can’t tell me that doesn’t look pretty? You can see that we’ve got a function called http; a function called title; and a list of strings called websites. Both functions and values are set using the let keyword, and subsequent lines are indented to show where they belong. No extra curly braces here!

The last line is my favourite part. The websites value is piped through to List.iter which will iterate through each of the URLs in the list and run the supplied function on them. In this case, that supplied function is an anonymous function which takes in the URL and prints out the title.

The final output is a printed list of titles:

Branded3 is a leading SEO, Web Design & Development Agency
twitition  - sign petitions using twitter

Naturally you can expand this by adding functions for using text files to get the URLs or write the output. Or even new functions that take the URL and get the HTTP status code or website structure, then output each line to a CSV file. But I’ll save those functions for future posts…

By at 4:28PM on Monday, 11 Apr 2011

comments

Comments are closed.