I'm going to go ahead and answer my own question with the solution I ended up with.
As you'll read above in the question the last update I added is pretty much the end solution.
When reading data from the CSV I use the is_non_utf8 check on the string and if true I run the following logic: * If the string contains one of the following bytes then assume MacRoman: 0x8E, 0x8F, 0x9A, 0xA1, 0xA5, 0xA8, 0xD0, 0xD1, 0xD5, 0xE1 * If the string contains one of the following bytes then assume Windows-1252: 0x92, 0x95, 0x96, 0x97, 0xAE, 0xB0, 0xB7, 0xE8, 0xE9, 0xF6
If one of the above is true/assumed then i use iconv to convert the string to UTF-8. If not then I do nothing to the string and continue as usual.
So using the contains_non_utf8 function to detect when a string first has character(s) in it that's not utf-8 encoded, then a function to detect whether it's MacRoman or Windows-1252. After that I can simply run iconv('MACROMAN', 'UTF-8', $str) or iconv('Windows-1252', 'UTF-8', $str) to receive UTF-8 valid string to go forward with.
On inserting I've wrapped the insert query into a try/catch statement, in the catch I look for the error code of "1366" and if that's true I update the row of data to exclude the data but mark the row as an error record with an error message as well. Although this never allows me to provide an export back to the user with the data I'm unable to import it does provide them with the line number so they can look at back at the upload file they used to determine the record that failed to be imported.
So there you have it. This is how I achieved the ability for a user to download a template CSV, open it in Excel (Mac or Windows), add data that includes umlaut (or another foreign character that's available in UTF-8), click save, pick file in file html input, submit, and imports successfully/correctly. It's going into use within the next month so if anything else comes up I'll be sure to update this ticket with those details.
Here's the functions I'm using:
function isMacRomanEncoded($str) {
$testBytes = array(0x8E, 0x8F, 0x9A, 0xA1, 0xA5, 0xA8, 0xD0, 0xD1, 0xD5, 0xE1);
foreach ($testBytes as $testByte) {
if (mb_strpos($str, chr($testByte)) !== false) {
return true;
}
}
return false;
}
function isWindows1252Encoded($str)
{
$testBytes = array(0x92, 0x95, 0x96, 0x97, 0xAE, 0xB0, 0xB7, 0xE8, 0xE9, 0xF6);
foreach ($testBytes as $testByte) {
if (mb_strpos($str, chr($testByte)) !== false) {
return true;
}
}
return false;
}
Here's an example of the catch statement mentioned:
try {
return $this->saveImportRow($array)
} catch (Exception $e) {
if ($e->getCode() === 1366) {
$array['dataColumn'] = null;
$array['status'] = '2'; // 2 = Error
$array['msg'] = 'Row contained invalid characters';
return $this->saveImportRow($array)
}
throw $e;
}
If you have any questions (or further input) let me know.
Thanks!