dongpo7467 2010-04-30 01:05
浏览 31
已采纳

当htmlentities设置了“UTF-8”时,图像无法上传

I have a form that, among other things, accepts an image for upload and sticks it in the database. Previously I had a function filtering the POSTed data that was basically:

function processInput($stuff) {
    $formdata = $stuff;
    $formdata = htmlentities($formdata, ENT_QUOTES);
    return "'" . mysql_real_escape_string(stripslashes($formdata)) . "'";
}

When, in an effort to fix some weird entities that weren't getting converted properly I changed the function to (all that has changed is I added that 'UTF-8' bit in htmlentities):

function processInput($stuff) {
        $formdata = $stuff;
        $formdata = htmlentities($formdata, ENT_QUOTES, 'UTF-8'); //added UTF-8
        return "'" . mysql_real_escape_string(stripslashes($formdata)) . "'";
    }

And now images will not upload.

What would be causing this? Simply removing the 'UTF-8' bit allows images to upload properly but then some of the MS Word entities that users put into the system show up as gibberish. What is going on?

**EDIT: Since I cannot do much to change the code on this beast I was able to slap a bandaid on by using htmlspecialchars() rather than htmlentities() and that seems to at least leave the image data untouched while converting things like quotes, angle brackets, etc. bobince's advice is excellent but in this case I cannot now spend the time needed to fix the messy legacy code in this project. Most stuff I deal with is object oriented and framework based but now I see first hand what people mean when they talk about "spaghetti code" in PHP.

  • 写回答

1条回答 默认 最新

  • doujiao3998 2010-04-30 01:47
    关注
    function processInput($stuff) {
        $formdata = $stuff;
        $formdata = htmlentities($formdata, ENT_QUOTES);
        return "'" . mysql_real_escape_string(stripslashes($formdata)) . "'";
    }
    

    This function represents a basic misunderstanding of string processing, one common to PHP programmers.

    SQL-escaping, HTML-escaping and input validation are three separate functions, to be used at different stages of your script. It makes no sense to try to do them all in one go; it will only result in characters that are ‘special’ to any one of the processes getting mangled when used in the other parts of the script. You can try to tinker with this function to try to fix mangling in one part of the app, but you'll break something else.

    Why are images being mangled? Well, it's not immediately clear via what path image data is going from a $_FILES temporary upload file to the database. If this function is involved at any point though, it's going to completely ruin the binary content of an image file. Backslashes removed and HTML-escaped... no image could survive that.

    1. mysql_real_escape_string is for escaping some text for inclusion in a MySQL string literal. It should be used always-and-only when making an SQL string literal with inserted text, and not globally applied to input. Because some things that come in in the input aren't going immediately or solely to the database. For example, if you echo one of the input values to the HTML page, you'll find you get a bunch of unwanted backslashes in it when it contains characters like '. This is how you end up with pages full of runaway backslashes.

      (Even then, parameterised queries are generally preferable to manual string hacking and mysql_real_escape_string. They hide the details of string escaping from you so you don't get confused by them.)

    2. htmlentities is for escaping text for inclusion in an HTML page. It should be used always-and-only in the output templating bit of your PHP. It is inappropriate to run it globally over all your input because not everything is going to end up in an HTML page or solely in an HTML page, and most probably it's going to go to the database first where you absolutely don't want a load of < and & rubbish making your text fail to search or substring reliably.

      (Even then, htmlspecialchars is generally preferable to htmlentities as it only encodes the characters that really need it. htmlentities will add needless escaping, and unless you tell it the right encoding it'll also totally mess up all your non-ASCII characters. htmlentities should almost never be used.)

    3. As for stripslashes... well, you sometimes need to apply that to input, but only when the idiotic magic_quotes_gpc option is turned on. You certainly shouldn't apply it all the time, only when you detect magic_quotes_gpc is on. It is long deprecated and thankfully dying out, so it's probably just as good to bomb out with an error message if you detect it being turned on. Then you could chuck the whole processInput thing away.

    To summarise:

    • At start time, do no global input processing. You can do application-specific validation here if you want, like checking a phone number is just numbers, or removing control characters from text or something, but there should be no escaping happening here.

    • When making an SQL query with a string literal in it, use SQL-escaping on the value as it goes into the string: $query= "SELECT * FROM t WHERE name='".mysql_real_escape_string($name)."'";. You can define a function with a shorter name to do the escaping to save some typing. Or, more readably, parameterisation.

    • When making HTML output with strings from the input or the database or elsewhere, use HTML-escaping, eg.: <p>Hello, <?php echo htmlspecialchars($name); ?>!</p>. Again, you can define a function with a short name to do echo htmlspecialchars to save on typing.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 如何让企业微信机器人实现消息汇总整合
  • ¥50 关于#ui#的问题:做yolov8的ui界面出现的问题
  • ¥15 如何用Python爬取各高校教师公开的教育和工作经历
  • ¥15 TLE9879QXA40 电机驱动
  • ¥20 对于工程问题的非线性数学模型进行线性化
  • ¥15 Mirare PLUS 进行密钥认证?(详解)
  • ¥15 物体双站RCS和其组成阵列后的双站RCS关系验证
  • ¥20 想用ollama做一个自己的AI数据库
  • ¥15 关于qualoth编辑及缝合服装领子的问题解决方案探寻
  • ¥15 请问怎么才能复现这样的图呀