With a little side of applesauce...

Thursday, October 14, 2010

Google Apps Email Migration API migration scripts written in BASH

Now that we have completed the migration to Google Apps, and our migration scripts are now gathering dust, I thought I would post them so that they may be used as a reference for anyone who is trying to handle large migrations through the Google Apps Email Migration API.

We started the migration process in early 2008, and performed test migrations and usage from May-Aug. Once we were satisfied with our migration processes and Google Apps services, we provided a one year window for students, faculty and staff to migrate their email from our legacy system, (essentially a Sendmail with UW-IMAP, and 100MB of storage), to Google Apps.

At the time, we had the following needs:
1. Due to the small email storage, we had to migrate local email archives from Thunderbird (Mac/Windows), Microsoft Outlook (3 clients), and Mac Mail. (Our one Rmail user handled their own migration :) ).
2. We needed to have a cron job which ran on our GNU/Linux box to pick-up additional students, faculty, and staff, as they chose to 'Migrate My Email' via our migration web page.
3. We needed to convert folders into labels.
4. We needed to handle both mbox and maildir mailboxes.
5. We needed to handle migrations on mostly Windows XP, and Mac 10.3 - 10.5.

BASH may not seem like anyones first choice for performing something as difficult as this, but I found that, (mostly due to my existing BASH skills, and lack of JAVA chops), I could understand and prototype the process very quickly. Plus, BASH would work 'out-of-the-box' on Mac and GNU/Linux, (not seamlessly, as I thought in the beginning), and we would use an automated Cygwin installation to handle the migration on Windows.

Coolness:
1. I got to use recursion in BASH for the first time.
2. We learned how to create a modified Cygwin repository and installation scripts.
3. We learned that sed doesn't work the same on GNU/Linux and Mac, but it does take the place of tidy.
4. We still had to use PERL for one line of code :(
5. We got to meet a lot of great people out on-campus.

As I have written in the Contact migration entries, I split my 'complex' shell scripts into a 'front-end', (run.sh), which handles usage, and command-line arguments, and imports the common library of functions, (common.sh), which is the brains of the outfit. I have also included our windows_run.sh and mac_run.sh, which were platform specific run scripts, which essentially gathered the needed information to call run.sh from either Mac or Windows.

1. run.sh:
This script is very straight-forward. It:
- initializes our variables
- creates our log file, (which is unique per account).
- calls the import functions. (Google Apps has never implemented an export functionality in their API).

#!/bin/sh

####################
#
# Copyright 2007 Shannon Eric Peevey <speeves@erikin.com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
#
#####################
#
# Email Import/Export script
# - exports Email from the existing systems, and re-imports from the gmail systems.
#
####################

# import our common library
. ./common.sh

# we will need to grab a username and parse their ~/Mail/ for mailboxes, plus /var/spool/mail for their INBOX.
# notes: their may be nested folders in ~/Mail/. We will split the path up to those folders, and create labels.
ACTION=""
DOMAIN_USER=""
DOMAIN_PASSWORD=""
USERNAME=""
DOMAIN=""
MBOX=""
MAILBOX_TYPE=""
DEFAULTDOMAIN="my.edu"
ALL="false"
PATH_TO_INBOX="/var/spool/mail/"
PATH_TO_HOME_MAIL_SET=""
HOMEDIR_CONFIG_FILE="/etc/passwd"
EXEC_DIR=$(pwd)
TMP="/tmp/"
date=$(date +%Y-%d-%m)
# how long we are willing to wait before failing a message
PAUSE_MAX="120"
LOGDIR="${EXEC_DIR}/log/"
DEPLOYMENT_LOG="${LOGDIR}mailmigration.$(date +%Y%d%m).log"
EVENT_LOG="${LOGDIR}mailmigration.$(date +%Y%d%m).event.log"
MyOS=$(uname)

unset MESSAGEFILE
unset TMPFOLDER

usage="
---------------------------
Usage: ${0} [-hadbupo]

Help Options
-h This message

Actions
-a Action to perform
(
i|import - import email into gmail
e|export - export gmail back
)

Variables
-d Gmail admin user
-b Gmail admin password
-u user@my.edu mailbox to migrate
-P user account password
-m single mailbox
-p path up to mailbox
-M mailbox is in Maildir format

-A migrate all users with and INBOX in /var/spool/mail/ (optional) (not implemented)

Example Usage: ${0} -a i -d gadmin_user -b gadmin_user_pw -u username
"

while getopts "a:d:b:u:p:m:P:MAh" OPT
do
case $OPT in
a )
ACTION=$OPTARG
;;
d )
DOMAIN_USER=$OPTARG
;;
b )
DOMAIN_PASSWORD=$OPTARG
;;
u )
USERNAME_ARRAY=$(echo $OPTARG | tr @ " ")
if [ "${#USERNAME_ARRAY[@]}" -gt 1 ]; then
DOMAIN=$(echo $USERNAME_ARRAY | cut -d" " -f2)
else
DOMAIN=$DEFAULTDOMAIN
fi
USERNAME=$(echo $USERNAME_ARRAY | cut -d" " -f1)
;;
p )
PATH_TO_HOME_MAIL=$OPTARG
PATH_TO_HOME_MAIL_SET="true"
;;
m )
MAILBOX=$OPTARG
;;
M )
MAILBOX_TYPE="maildir"
;;
A )
ALL="true"
;;
P )
USER_PASSWORD=$OPTARG
;;
h )
echo "$usage"
exit 1
;;
\?)
echo "$usage"
exit 1
;;
esac
done
# remove the flags from $@
shift $((${OPTIND} - 1))

# set mailbox type
if [ "${MAILBOX_TYPE}" = "" ]; then
MAILBOX_TYPE="mbox"
fi

# Check the existence of the log directory
logdircreate
logfilecreate

if [ "${ACTION}" = "" ]; then
echo "$usage"
exit 1;
else
# grab the action that we need to perform
case $ACTION in
i|import )
if [ "${ALL}" = "true" ]; then
echo "find_all_mbox_mailboxes is not implemented"
else
migrate_single_user
fi
;;
e|export )
if [ "${ALL}" = "true" ]; then
echo "grab_all_gmail_user_mailboxes is not implemented"
else
echo "export_from_gmail is not implemented"
fi
;;
esac
fi


2. common.sh:
The common library implements the following functions:

catbody()
- base64-encodes the message body, so that all special characters do not foul-up our import.

changenewlines()
- change \n to \r\n.

checktmpfolder()
- see if a conflicting tmp folder exists. If it exists, delete it, and create a new one. (useful if you need to re-run a migration).

cleantmpdir()
- removes the tmp directory after migration is complete.

createproperties()
- grab mail-related properties, (ie Drafts folder, Sent folder, MOZ_STATUS_READ), and convert them to Google Apps equivalents, (ie IS_DRAFT, IS_SENT, IS_UNREAD). This handles Thunderbird-related status' as well.

errorchecking()
- check the response for each message insertion. Anything greater than 300 is an error.

eventlogentry()
- logs the google response to a log file for debugging purposes.

feedfooter()
- adds the footer XML to our import file.

feedheader()
- adds the header XML to our import file.

getauthtoken()
- use curl to authenticate our application to Google, who returns an authtoken, which is included with each import request from this point forward.

googlemigrate_main()
- moves into an email directory, splits our mail messages, (if mbox), creates a list of messages, then loops through each message and generates an XML import file for each message. Once the import file is created, we send the import file to Google, and loop until we are out of messages. Then we step out of our directory and clean up after ourselves.

handlemaildir()
- We loop through maildir files in the current directory, and remove the first line of the file, (which must be maildir-related information, but I can't remember anymore :P ).

incrementmsgfile()
- increase our iterator by 1.

logdircreate()
- create our log directory.

logfilecreate()
- create our log file.

migrate_single_user()
- calls getauthtoken, checks if we migrating inbox, plus additional folders, then calls googlemigrate_main.

postentry()
- uses CURL to send the XML import file to Google.

postlogentry()
- adds a log entry for each message sent to Google, (success/failure).

postloop()
- Our recursive method! Calls postentry, then checks the errorcode returned by Google. If the errorcode is greater than 300 and less than 500, the message is bad, and Google won't accept it. (Often a missing header). If the errorcode is greater than 500, then Google is throttling us, and we need to pause, and call ourself again. Google was throttling us at about 1 message per second per account, (which is another reason BASH worked well. It was fairly slow!).

printendofmessage()
- add the message end XML to our import file.

printlabels()
- We assumed that a folder was the equivalent of a label. Also, we decided to split any sub-folders into individual labels, (as opposed to one long label). Therefore, 'Mail/folder1/folder2/' was split into the following labels: 'folder1' and 'folder2'. printlabels then added the XML for each label to our import file, so that they were included in the message on the other side.

printproperties()
- take the values from createproperties and add them to the import file.

splitmbox()
- Our one PERL statement. Essentially, split our mbox file into single messages files, (a la maildir). I tried and tried to do this in BASH, but PERL did it so much more easily.

#!/bin/sh

migrate_single_user()
{
# set tmpfolder
export TMPFOLDER="${TMP}${USERNAME}/"
MESSAGEFILE="${TMP}${USERNAME}.xml"
LOG_DATE=$(date +%d/%b/%Y:%X\ %z)

# let's wrap our migration steps here to simplify the function above
authtoken=$(getauthtoken)
if [ $(echo ${AuthToken} | grep Error) ]; then
echo "Username and/or password are incorrect, please re-run this program by downloading it again."
exit 1
fi
## we need to check if this is a single mbox, or if we are migrating all folders
if [ "${MAILBOX}" = "" ]; then
## Ok, let's test to see if the user has an INBOX
if [ ! -e "${PATH_TO_INBOX}${USERNAME}" ]; then
echo "Sorry, this user (${USERNAME}) does not have an INBOX... Aborting Migration"
exit 1
else
IS_INBOX="true"
# first, we upload their INBOX
## need to think on this one
CURRENTMAILBOXPATH=${PATH_TO_INBOX}
MAILBOX=${USERNAME}
googlemigrate_main
unset IS_INBOX
fi

if [ "${PATH_TO_HOME_MAIL_SET}" != "true" ]; then
if [ "$PATH_TO_HOME" = "" ]; then
PATH_TO_HOME=$( cat ${HOMEDIR_CONFIG_FILE} | grep ${USERNAME}: | cut -d: -f6 )
fi
PATH_TO_HOME_MAIL="${PATH_TO_HOME}/Mail/"
fi

if [ ! -e "${PATH_TO_HOME_MAIL}" ]; then
echo "User: (${USERNAME}) doesn't have any non-INBOX folders"
NONINBOXFOLDERS="false"
else
NONINBOXFOLDERS="true"
fi

# next, we upload their non-INBOX mail folders
if [ "$NONINBOXFOLDERS" = "true" ]; then
CURRENTMAILBOXPATH=${PATH_TO_HOME_MAIL}
# Now, we run the script with the mail boxes that we have found
cd "${CURRENTMAILBOXPATH}"
while read MAILBOX
do
# don't migrate Trash and Junk
IS_TRASH=$(echo "${MAILBOX}" | grep -i "trash" )
IS_JUNK=$(echo "${MAILBOX}" | grep -i "junk")
if [ "$IS_TRASH" != "" ] || [ "$IS_JUNK" != "" ]; then
echo "We don't push Junk or Trash folders"
else
googlemigrate_main
fi
unset IS_TRASH
unset IS_JUNK
done < "${CURRENTMAILBOXPATH}/.mailboxlist"
fi
else
if [ "${PATH_TO_HOME_MAIL_SET}" != "true" ]; then
PATH_TO_HOME_MAIL=$(dirname "${MAILBOX}")
MAILBOX=$(basename "${MAILBOX}")
fi
CURRENTMAILBOXPATH=${PATH_TO_HOME_MAIL}
googlemigrate_main
fi

cd -
}

googlemigrate_main()
{
cd "${CURRENTMAILBOXPATH}"

#handle funky folders
#oldname="${MBOX}"
#MBOX=$(echo $oldname | sed -e 's/[~`!$%&*()\|<>\\\/]//g')
#mv "${oldname}" "${MBOX}"

# mbox or maildir
if [ "${MAILBOX_TYPE}" = "mbox" ]; then
echo "Splitting the mbox"
splitmbox
else
echo "copying maildir messages"
handlemaildir
fi

# create the xml file message.xml
ls ${TMPFOLDER} > ${TMP}${USERNAME}.msglist
while read file
do
#handle funky folders
#oldname="$file"
#file=$(echo $oldname | sed -e 's/[~`!$%&*()\|<>\\\/]/_/g')

# set our pause time
pause=5

# set mailItemProperties
createproperties

# create the xml
feedheader > ${MESSAGEFILE}
catbody >> ${MESSAGEFILE}
printendofmessage >> ${MESSAGEFILE}
printproperties >> ${MESSAGEFILE}
printlabels >> ${MESSAGEFILE}
feedfooter >> ${MESSAGEFILE}

echo "Now we post"
changenewlines
# enter the post and errorchecking phase
postloop
done < ${TMP}${USERNAME}.msglist
echo "Now we clean up our mail messages"
cleantmpdir
rm ${TMP}${USERNAME}.msglist
}

postloop()
{
response=$(postentry)

# let's handle failures
errorcode=$(errorchecking)
postlogentry
if [ "${errorcode}" -gt "300" ] && [ "${errorcode}" -lt "500" ] || ["${errorcode}" = "" ]; then
# the message is bad, so we just log an error and continue
errorcode="400 Bad Request"
response="UPLOAD FAILURE"
postlogentry
continue
elif [ "${errorcode}" -gt "500" ]; then
echo "We have a failure. Pausing to upload it again"
if [ "${pause}" -gt ${PAUSE_MAX} ]; then
errorcode="503 Service Unavailable"
response="UPLOAD FAILURE"
postlogentry
continue
else

sleep ${pause}
(( pause= ${pause}*2 ))
postloop
fi
fi
}

postlogentry()
{
# push log entry into logfile (default: /tmp/mailmigration.log)
echo "${USERNAME}|${LOG_DATE}|${errorcode}|${file}|${response}|${CURRENTMAILBOXPATH}|${MAILBOX}" >> ${DEPLOYMENT_LOG}
}

eventlogentry()
{
# push log entry into logfile (default: /tmp/mailmigration.log) (TODO: not redirecting stderr, though an entry is added to event.log
echo "${USERNAME}|${LOG_DATE}|${errorcode}|${CURRENTMAILBOXPATH}|${MAILBOX}|${file}" >> ${EVENT_LOG}
}

getauthtoken()
{
# use curl to get auth token
if [ "$DOMAIN_USER" = "" ] || [ "$DOMAIN_PASSWORD" = "" ]; then
if [ "$USERNAME" = "" ] || [ "$USER_PASSWORD" = "" ]; then
echo "$usage"
exit 1
else
curl -s https://www.google.com/accounts/ClientLogin -d Email=${USERNAME}@${DOMAIN} -d Passwd=${USER_PASSWORD} -d accountType=HOSTED -d source=Google-cURL-Example-${USERNAME} -d service=apps | grep Auth
fi
else
curl -s https://www.google.com/accounts/ClientLogin -d Email=${DOMAIN_USER} -d Passwd=${DOMAIN_PASSWORD} -d accountType=HOSTED -d source=Google-cURL-Example-${USERNAME} -d service=apps | grep Auth
fi
}

postentry()
{
# use curl to post the xml batch feed that was created
curl -s --url https://apps-apis.google.com/a/feeds/migration/2.0/my.edu/${USERNAME}/mail/batch --header "Authorization: GoogleLogin ${authtoken}" --data "@${MESSAGEFILE}" --header "Content-Type: application/atom+xml"
}

splitmbox()
{
checktmpfolder

cd "${CURRENTMAILBOXPATH}"
export TMP_MAILBOX_NAME=$( echo "${MAILBOX}" | tr / \~)
perl -pe 'BEGIN { $n=1 } open STDOUT, ">$ENV{TMPFOLDER}$ENV{TMP_MAILBOX_NAME}.$n" and $n++ if /^From /' "${MAILBOX}"
}

handlemaildir()
{
checktmpfolder

cd "${CURRENTMAILBOXPATH}"

MESSAGES_FOLDER="${MAILBOX}/Messages"

for f in $(ls "${MESSAGES_FOLDER}")
do
cp "${MESSAGES_FOLDER}/$f" "${TMPFOLDER}/${f}"
sed -e '1d' "${TMPFOLDER}${TMP_MAILBOX_NAME}/${f}" > "${TMPFOLDER}${TMP_MAILBOX_NAME}/${f}.new" ; mv "${TMPFOLDER}${TMP_MAILBOX_NAME}/${f}.new" "${TMPFOLDER}${TMP_MAILBOX_NAME}/${f}"
done
}


checktmpfolder()
{
# create tmpdir and split the mbox into individual messages for cat'ing
if [ -e "${TMPFOLDER}" ]; then
rm -rf ${TMPFOLDER}
fi
mkdir ${TMPFOLDER}
}

incrementmsgfile()
{
export x=$((x+1))
}

cleantmpdir()
{
# remove tmpdir
rm -Rf "${TMPFOLDER}"
}

feedheader()
{
# text for the start of the mail message
echo '<?xml version="1.0" encoding="utf-8"?>'
echo '<feed xmlns="http://www.w3.org/2005/Atom" xmlns:batch="http://schemas.google.com/gdata/batch" xmlns:gd="http://schemas.google.com/g/2005">'
echo '<entry>'
echo '<category term="http://schemas.google.com/apps/2006#mailItem" scheme="http://schemas.google.com/g/2005#kind" />'
echo '<apps:rfc822Msg xmlns:apps="http://schemas.google.com/apps/2006" encoding="base64">'
}

catbody()
{
# cat the body of the message
#cat "${TMPFOLDER}/${file}" | base64
if [ ${MACMAIL} ]; then
cat "${TMPFOLDER}/${file}" | sed -e "s/enriched/plain/" > "${TMPFOLDER}/${file}.bak" ; mv "${TMPFOLDER}/${file}.bak" "${TMPFOLDER}/${file}"
fi
openssl enc -base64 -in "${TMPFOLDER}/${file}"
}

printendofmessage()
{
# print the xml close for the message
echo '</apps:rfc822Msg>'
}

printproperties()
{
# print the xml for the mailItemProperties
# create our mailItemProperties
if [ "$IS_DRAFT" = "true" ]; then
echo '<apps:mailItemProperty value="IS_DRAFT" xmlns:apps="http://schemas.google.com/apps/2006" />'
elif [ "$IS_SENT" = "true" ]; then
echo '<apps:mailItemProperty value="IS_SENT" xmlns:apps="http://schemas.google.com/apps/2006" />'
elif [ "$IS_INBOX" = "true" ]; then
echo '<apps:mailItemProperty value="IS_INBOX" xmlns:apps="http://schemas.google.com/apps/2006" />'
elif [ "$IS_STARRED" = "true" ]; then
echo '<apps:mailItemProperty value="IS_STARRED" xmlns:apps="http://schemas.google.com/apps/2006" />'
elif [ "$IS_UNREAD" = "true" ]; then
echo '<apps:mailItemProperty value="IS_UNREAD" xmlns:apps="http://schemas.google.com/apps/2006" />'
fi
}

printlabels()
{

# print the xml for the labels
echo '<apps:label labelName="migrated" xmlns:apps="http://schemas.google.com/apps/2006" />'

IS_NESTED=$(echo "${MAILBOX}" | grep "\/")
if [ "${IS_NESTED}" != "" ]; then
IFS="\/"
for l in ${MAILBOX}
do
l=$(echo $l | sed 's?.sbd??' | sed 's?.mbox??')
echo "<apps:label labelName=\"${l}\" xmlns:apps=\"http://schemas.google.com/apps/2006\" />"
done
unset IFS
else
if [ "$IS_INBOX" != "true" ] && [ "$MAILBOX" != "${USERNAME}" ]; then
l=$(echo ${MAILBOX} | sed 's?.sbd??' | sed 's?.mbox??')
if [ ${MACMAIL} ]; then
echo "<apps:label labelName=\"${MACMAIL_MAILBOX}\" xmlns:apps=\"http://schemas.google.com/apps/2006\" />"
else
echo "<apps:label labelName=\"${l}\" xmlns:apps=\"http://schemas.google.com/apps/2006\" />"
fi
fi
fi
}



feedfooter()
{
# text for the end of the mail message
echo '<batch:id>0</batch:id>'
echo '</entry>'
echo '</feed>'
}


createproperties()
{
# create our mailItemProperties
## TODO: handle the emlx flags correctly
case "$f" in
Drafts )
IS_DRAFT="true"
;;
Sent|Sent\ Items|Sent\ Messages* )
IS_SENT="true"
;;
esac

# now we handle IS_STARRED and IS_UNREAD
# flagged
MOZ_STATUS_FLAGGED=$(grep "^X-Mozilla-Status: " "${TMPFOLDER}${file}" | cut -d: -f2 | sed "s/\\r//" | sed 's/^[ \t]*//' | sed 's/[ \t]*$//')
tmpMOZ_STATUS_FLAGGED=yyy${MOZ_STATUS_FLAGGED}
if [ "$tmpMOZ_STATUS_FLAGGED" != "yyy" ]; then
MOZ_IS_FLAGGED=$(( (16#$MOZ_STATUS_FLAGGED / 4) % 2))
fi
MBOX_IS_FLAGGED=$(grep -r -h ^X-Status "${TMPFOLDER}${file}" | grep F)
if [ "$MBOX_IS_FLAGGED" != "" ] || [ "$MOZ_IS_FLAGGED" = "1" ] ; then
IS_STARRED="true"
fi

# if Maildir is specified, check if read
if [ "${MAILBOX_TYPE}" = "mbox" ]; then
# is read
MOZ_STATUS_READ=$(grep "^X-Mozilla-Status:" "${TMPFOLDER}${file}" | cut -d: -f2 | sed "s/\\r//" | sed 's/^[ \t]*//' | sed 's/[ \t]*$//')
tmpMOZ_STATUS_READ=yyy${MOZ_STATUS_read}
if [ "$tmpMOZ_STATUS_READ" != "" ]; then
MOZ_IS_READ=$((16#$MOZ_STATUS_READ % 2))
fi
MBOX_IS_READ=$(grep -h "^Status: " "${TMPFOLDER}${file}" | cut -d: -f2 | grep -v '[a-zA-MP-QS-Z0-9]')
if [ "$MBOX_IS_READ" != "yyy" ] || [ "$MOZ_IS_READ" = "1" ] ; then
echo "This message was read"
else
IS_UNREAD="true"
fi
else
EMLX_READ=$(grep '<integer>' "${TMPFOLDER}${file}" | cut -d\> -f2 | cut -d\< -f1)
EMLX_READ_MOD=$(($EMLX_READ % 2))
if [ "${EMLX_READ_MOD}" = "1" ]; then
echo "this message was read"
else
IS_UNREAD="true"
fi
fi
}

errorchecking()
{
# grab anything > 300
if [ "$MyOS" = "Darwin" ]; then
echo ${response} | sed 's|/*>|&\n\
|g' | grep status | awk '{print $2}' | cut -d= -f2 | cut -d\' -f2
else
echo ${response} | sed 's|/*>|&\n|g' | grep status | awk '{print $2}' | cut -d= -f2 | cut -d\' -f2
fi
}

changenewlines()
{
# change \n to \r\n
sed "s/\\n/\\r\\n/" "${MESSAGEFILE}" > "${MESSAGEFILE}.new"
cp "${MESSAGEFILE}.new" "${MESSAGEFILE}"
}

logdircreate()
{
# check to see if logdir exists... if not, create it
if [ ! -e "${LOGDIR}" ]; then
mkdir "${LOGDIR}"
fi
}

logfilecreate()
{
# check to see if logdir exists... if not, create it
if [ ! -e "${DEPLOYMENT_LOG}" ]; then
touch "${DEPLOYMENT_LOG}"
chmod 666 "${DEPLOYMENT_LOG}"
fi
}



windows_run.sh:
The windows process included installing Cygwin, which was automated with a DOS batch file. Once Cygwin was installed, we tricked Cygwin into calling this script by adding a line to the .bashrc, which was called as soon as the shell started. (It was clunky, but it worked every time :) ). This script essentially asks the user for their username/password, and the location of the mail directories for Thunderbird. (Thunderbird was our campus-supported client at the time). I found that there was no programmatic way to know which profile we were migrating, so it became simpler to grab it from the GUI. I handled one Outlook client by importing all of their email into Thunderbird, and then running this script. The other two clients were done with the Google Email migration desktop client, which worked well for them.

#!/bin/sh

VERIFYUSERPASSWORD="changme"
userprompt()
{
echo "Please enter your email address and password on the following prompts"
echo
echo -n "email address: "
read USEREMAIL
echo -n "Password: "
read USERPASSWORD
echo -n "Verify Password: "
read VERIFYUSERPASSWORD
}

userprompttest()
{
while [ "${USERPASSWORD}" != "${VERIFYUSERPASSWORD}" ]
do
userprompt
done
}

if [ "$USEREMAIL" -eq "" ]; then
userprompttest
fi

# set up our environment
#cd ../home/speeves
#DEFAULTPROFILE=$(ls -tr "/cygdrive/c/Documents and Settings/${USERNAME}/Application Data/Thunderbird/Profiles/"| tail -n 1)
#PROFILEPATH="/cygdrive/c/Documents and Settings/${USERNAME}/Application Data/Thunderbird/Profiles/${DEFAULTPROFILE}"
#ln -s "${PROFILEPATH}/Mail/Local Folders" localmail
ln -s "${LOCALFOLDERLOCATION}" localmail
cd localmail
find . | grep -v .msf | cut -d/ -f2- > /tmp/t
cd ..

while read line
do
./run.sh -a i -u ${USEREMAIL} -P ${USERPASSWORD} -p "localmail/" -m "$line"
done < /tmp/t


# clean up our mess
clear
echo
echo "########################################################################"
echo "#"
echo "# We have completed migrating your local email."
echo "# Please, contact help@my.edu if you have any questions"
echo "#"
echo "########################################################################"
echo
echo
echo "We will now remove ourselves from your computer..."
rm -rf ../../../cygwin


mac_run.sh:
We tried to use applescript for our mac clients, but these failed out on various versions of 10.x. It simpler to just open up Terminal and start this script by hand. It handled both Mac Mail and Thunderbird, and asked different questions accordingly.

#!/bin/sh

####################
#
# Copyright 2007 Shannon Eric Peevey <speeves@erikin.com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
#
#####################
#
# Email Import/Export script
# - exports Email from the existing systems, and re-imports from the gmail systems.
#
####################

#USERNAME=$1
#PASSWORD=$2
USERPASSWORD=""
VERIFYUSERPASSWORD="changeme"
MACMAIL=3

userprompt()
{
echo "Please enter your email address and password on the following prompts"
echo
echo -n "email address: "
read USEREMAIL
echo -n "Password: "
read USERPASSWORD
echo -n "Verify Password: "
read VERIFYUSERPASSWORD
}

userprompttest()
{
while [ "${USERPASSWORD}" != "${VERIFYUSERPASSWORD}" ]
do
userprompt
done
}

macmailuserprompttest()
{
while [ ${MACMAIL} != 1 ] && [ ${MACMAIL} != 2 ]
do
echo "Please enter the number 1 or 2 for the following: "
echo -n "Thunderbird or MacMail (Thunderbird = 1 | Mac Mail = 2) ?"
read MACMAIL
done
}

if [ "$#" -gt 1 ] && [ "$#" -eq 2 ]; then
USEREMAIL=$1
USERPASSWORD=$2
else
userprompttest
#echo "usage: $0 [email] [passwd]"
#exit 1
fi

MyOS=$(uname)

# don't continue if the username and password are empty

case $MyOS in
Cygwin )
PROFILE_HOME="/cygdrive/c/Documents and Settings/${USERNAME}/Application Data/Thunderbird/Profiles/"
;;
GNU/Linux | Linux )
PROFILE_HOME="/home/${USERNAME}/.mozilla-thunderbird/"
;;
Darwin )

macmailuserprompttest

if [ "${MACMAIL}" -eq 1 ]; then
echo -n "Preparing to migrate Thunderbird Folders"
else
echo -n "Preparing to migrate Mac Mail Folders"
fi

if [ "${MACMAIL}" -eq 1 ]; then
PROFILE_HOME="/Users/${USER}/Library/Thunderbird/Profiles/"
else
PROFILE_HOME="/Users/${USER}/Library/Mail/"
fi
;;
\? )
echo "I don't know this operating system"
exit 1
;;
esac

if [ "${MACMAIL}" -ne 2 ]; then
NEWEST_PROFILE=$(ls -tr ${PROFILE_HOME} | tail -n 1)
else
NEWEST_PROFILE="Mailboxes"
fi

MAILBOX_FILE="/tmp/mailboxfile"

# check that the returned value is a directory
PROFILE="${PROFILE_HOME}${NEWEST_PROFILE}/"

if [ "${MACMAIL}" -ne 2 ]; then
LOCAL_FOLDERS="${PROFILE}/Mail/Local Folders/"
else
LOCAL_FOLDERS="${PROFILE}"
fi

ln -s "${LOCAL_FOLDERS}" localmail
cd localmail
pwd
# cat the .mailbox file
if [ "${MACMAIL}" -ne 2 ]; then
find . | grep -v msf | cut -d/ -f2- | grep -v '\.' > "${MAILBOX_FILE}"
else
ls -d */ | grep -v '#' | cut -d/ -f1 > "${MAILBOX_FILE}"
fi
cd ..

if [ "${MACMAIL}" -ne 2 ]; then
while read line
do
bash -x ./run.sh -a i -u ${USEREMAIL} -P ${USERPASSWORD} -p "localmail/" -m "$line"
done < "${MAILBOX_FILE}"

else
echo -n "Preparing to push Mailboxes"
while read line
do
newline=$(echo $line | tr \( " " | tr \) " " | sed -e 's/\.mbox//')
export MACMAIL
export MACMAIL_MAILBOX="${line}"
bash -x ./run.sh -a i -u ${USEREMAIL} -P ${USERPASSWORD} -p "localmail/" -m "${line}" -M
unset MACMAIL_MAILBOX
done < "${MAILBOX_FILE}"
fi

clear
echo
echo "########################################################################"
echo "#"
echo "# We have completed migrating your local email."
echo "# Please, contact help@my.edu if you have any questions"
echo "#"
echo "########################################################################"
echo
echo


The process was often performed in two steps:
1. The automated migrations from the server, (which also used these scripts).
2. Touching the machine to fire-off these migration scripts, and then importing their contacts, (via csv), so that their user lookup was seeded right off the bat.

Try as I could, (and I performed hundreds of these migrations), I could not find a better tool for migrating a large number of users with local mail folders, nor automate the contact migration, (which is now much more robust). We weren't early adopters of Google Apps, (my erikin.com account had been an early adopter back when it was by invitation only in 2006), but, in 2008, the APIs were not well documented, nor fully-implemented, plus the existing libraries in JAVA had problems in batch loading emails, (compounded by my lack of JAVA chops), and the Google Email Uploader, (which I see is now deprecated), would crash often, and, in some situations, would only handle one mailbox at a time, (if it didn't automatically find it).

I'm sure the situation has changed considerably, but this was a fun project, and it is fun to look back at the issues that we ran into at the time. Let me know if you have any questions about the application, and how we implemented it. (I can probably also dig up our Cygwin installation kit and repository package listing as well).

Peace!

2 comments:

CaliVW78 said...

Your lack of Java foo migrates to most every code base for myself :) Between Google, copying, pasting, and hacking, I can usually get something together. But if something goes awry, well crap. Anyways, I'm running into issues with common.sh that include the following details.

Command=
./run.sh -a i -d admin@domain.com -b password -u user@domain.com

Errors donated=
Splitting the mbox
This message was read
Now we post
./common.sh: line 134: [: : integer expression expected
./common.sh: line 134: [: =: unary operator expected
./common.sh: line 140: [: : integer expression expected

./common.sh: line 134=
if [ "${errorcode}" -gt "300" ] && [ "${errorcode}" -lt "500" ] || ["${errorcode}" = "" ]; then

./common.sh: line 140=
elif [ "${errorcode}" -gt "500" ]; then


Any thoughts? And thanks in advance for making this publicly available!

Shannon Eric Peevey said...

Hi CaliVW78,

Thanks for giving this a go! It has been about three years since I have run this, so am running through the memory rolo-dex :P Does it keep scrolling and posting the emails? My only record shows me telling the 'customer' not to worry about the error. We ran the whole migration with that error running by, so it shouldn't be a problem, but if you want to help me chase it down, I can post an update to the code :)

My guess is that I am handling the XML response from google, and that BASH is having a hard time determining the appropriate type for the ${errorcode} variable. (This theory is reenforced by this BASH gotchas page: http://tldp.org/LDP/abs/html/gotchas.html )

Let me know if there are additional issues.

Take care!