With a little side of applesauce...

Thursday, June 4, 2009

Coldfusion / Postgresql - prepare Norwegian characters with diacritics for case-insensitive search

Postgresql is unable to perform case-insensitive searches on letters with diacritics, so I created a Coldfusion function to search and replace these special characters, with their regular expression equivalent. ie:
Lårdal becomes L(Å|å)rdal


Here is the example SQL statement after running the function:
SELECT * from bygdeboker WHERE kommune ~* 'L(Å|å)rdal'


Then encoding became an issue, as our Postgresql database is set to UTF-8 encoding, but Coldfusion assumes iso-8859-1 on form variables. Once we figured that out, the function started to work:

    <cffunction name="searchNorwegianSpecialCharacterCaseInsensitive"
returnType="string"
access="public"
output="no"
hint="Returns a string containing a regular expression-ready search for handling
special characters in both upper and lowercase. ie:
Bokmål becomes Bokm(Å|å)l

which would be used in:
SELECT * from bygdeboker
WHERE kommune ~* 'Bokm(Å|å)l'

Access seems to handle special character searches out of the box,
but this type of search is needed for Postgresql.

This function only handles the following Norwegian characters:
Æ,æ,Å,å,Ø,ø,Ä,ä,Ë,ë,Ö,ö,Ü,ü,É,é

@Param myString string on which to search and replace

@return newString string which is SELECT ready">
<cfargument name="myString" type="any" required="true">
<!--- letter we are not using à --->
<cfset lEntities = "Æ,æ,Å,å,Ø,ø,Ä,ä,Ë,ë,Ö,ö,Ü,ü,É,é">
<cfset lEntitiesChars = "(Æ|æ),(Æ|æ),(Å|å),(Å|å),(Ø|ø),(Ø|ø),(Ä|ä),(Ä|ä),(Ë|ë),(Ë|ë),(Ö|ö),(Ö|ö),(Ü|ü),(Ü|ü),(É|é),(É|é)">

<cfset newString = ReplaceList(arguments.myString, lEntities, lEntitiesChars)>
<cfreturn newString>
</cffunction>


Thanks to Bob for his help on understanding UTF-8/iso-8859-1 encoding!

2 comments:

Dave Shuck said...

Currently that function is not thread safe! It is relatively unimportant to var scope the two entity list variables, but you could certainly end up with threading collisions on your newString variable. You could make it thread safe in two ways:

a) put this line just below your cfargument: <cfset var newString = "" />

OR

b) get rid of that variable altogether with:
<cfreturn ReplaceList(arguments.myString, lEntities, lEntitiesChars) />

Shannon Eric Peevey said...

Thanks, Dave! My Coldfusion programming sometimes reflects my Coldfusion 4 and 5 bones :P