When the browser sends data in the body of a POST request (i.e. the name=value
pairs from form elements), how does PHP determine the character encoding so it can properly decode the bit stream into characters for its own internal usage?
I can understand for some tasks where PHP won't need to decode, e.g. for SQL INSERT queries, it may simply pass the data/string along to the DBMS with no additional processing.
But for text processing/regex operations, I imagine PHP will need to decode the bit stream into characters, before it can perform test, pattern matches etc on them.
Also, it seems that because the encoding is determined by the browser, PHP will need guidance from the browser on what charset it used to encode the POST data.
Expecting this guidance would be in the request headers, I set up a text form with
<meta charset="utf-8">
in the head of the webpage containing the form, then after entering some values and submitting the form, the request headers contains no obvious information about how it encoded the POST data
POST /experiments/foo.php HTTP/1.1
Host: localhost
Connection: keep-alive
Content-Length: 57
Pragma: no-cache
Cache-Control: no-cache
Origin: http://localhost
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: http://localhost/experiments/how_does_php_encode_data_it_receives_from_browser.php
Accept-Encoding: gzip, deflate
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
Or is there something else going on? e.g. is the browser expected to encode characters to some pre-determined standard?
How does PHP know how to decode data it receives from the browser POST requests?