Wednesday, January 05, 2005

regular expression

http://www.devx.com/vb2themax/Tip/18638


Extract words with the RegExp object
The following routine extracts all the words from a source string and returns a collection. Optionally, the result contains only unique words.

This code is remarkably simpler than an equivalent "pure" VB solution because it takes advantage of the RegExp object in the Microsoft VBScript Regular Expression type library.



' Get a collection of all the words in a string
' If the second argument is True, only unique words are returned
'
'
' NOTE: requires a reference to the
' Microsoft VBScript Regular Expression type library

Function GetWords(ByVal Text As String, Optional DiscardDups As Boolean) As _
Collection
Dim re As New RegExp
Dim ma As Match

' the following pattern means that we're looking for a word character (\w)
' repeated one or more times (the + suffix), and that occurs on a word
' boundary (leading and trailing \b sequences)
re.Pattern = "\b\w+\b"
' search for *all* occurrences
re.Global = True

' initialize the result
Set GetWords = New Collection

' we need to ignore errors, if duplicates are to be discarded
On Error Resume Next

' the Execute method does the search and returns a MatchCollection object
For Each ma In re.Execute(Text)
If DiscardDups Then
' if duplicates are to be discarded, we just add a key to the
' collection item
' and the Add method will do the rest
GetWords.Add ma.Value, ma.Value
Else
' otherwise just add to the result
GetWords.Add ma.Value
End If
Next

End Function
Here is an example of how you can use the routine:


' Count how many articles appear in a source string
' held in the txtSource textbox control
Dim v As Variant
Dim count As Long

For Each v In GetWords(txtSource.Text)
Select Case LCase$(v)
Case "the", "a", "an"
count = count + 1
End Select
Next
MsgBox "Found " & count & " articles."


0 Comments:

Post a Comment

<< Home