Friday, January 9, 2015

VB.NET - Regular Expression to get all HREF's out of a String

This function will accept a string and return an arraylist of all the HREF's that were inside:

    ''' <summary>
    ''' Using regular expression to get HREF out of string
    '''
    ''' </summary>
    ''' <param name="inputString"></param>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Public Shared Function GetHtmlTags(inputString As String) As ArrayList
        Dim m As Match
        Dim HRefPattern As String = "<img[^>]+src\\s*=\\s*['\""]([^'\""]+)['\""][^>]*>"
        Dim arrlist As New ArrayList
        Try
            'Using regular expression retrieve all HREFs from the datacontent string
            m = Regex.Match(inputString, HRefPattern, RegexOptions.IgnoreCase Or RegexOptions.Compiled)
            Do While m.Success
                'check to see if the link already exists in the arraylist before adding it again
                If arrlist.Contains(m.Groups(1).ToString()) = False Then
                    arrlist.Add(m.Groups(1).ToString())
                End If

                m = m.NextMatch()
            Loop

            Return arrlist

        Catch ex As Exception
            Throw ex
        End Try
    End Function



No comments:

Post a Comment