douge7771 2012-03-09 09:10
浏览 54
已采纳

将字符串分隔成数组?

From the given string that is $codes I just want to have all language to language array, all code to code array and finally all family to family array , how can i do this in php? i have tried using dom , but its not possible any otherway would be appreciated, Thanks in advance.

<?php
 $codes = '<pre>
 LANGUAGE      CODE     LANGUAGE FAMILY

AFAR            AA     HAMITIC
ABKHAZIAN       AB     IBERO-CAUCASIAN
AFRIKAANS       AF     GERMANIC
AMHARIC         AM     SEMITIC
ARABIC          AR     SEMITIC
ASSAMESE        AS     INDIAN
AYMARA          AY     AMERINDIAN
AZERBAIJANI     AZ     TURKIC/ALTAIC
BASHKIR         BA     TURKIC/ALTAIC
BYELORUSSIAN    BE     SLAVIC
BULGARIAN       BG     SLAVIC
BIHARI          BH     INDIAN
BISLAMA         BI     [not given]
BENGALI;BANGLA  BN     INDIAN
TIBETAN         BO     ASIAN
BRETON          BR     CELTIC
CATALAN         CA     ROMANCE
CORSICAN        CO     ROMANCE
CZECH           CS     SLAVIC
WELSH           CY     CELTIC
DANISH          DA     GERMANIC
GERMAN          DE     GERMANIC
BHUTANI         DZ     ASIAN
GREEK           EL     LATIN/GREEK
ENGLISH         EN     GERMANIC
ESPERANTO       EO     INTERNATIONAL AUX.
SPANISH         ES     ROMANCE
ESTONIAN        ET     FINNO-UGRIC
BASQUE          EU     BASQUE
PERSIAN (farsi) FA     IRANIAN
FINNISH         FI     FINNO-UGRIC
FIJI            FJ     OCEANIC/INDONESIAN
FAROESE         FO     GERMANIC
FRENCH          FR     ROMANCE
FRISIAN         FY     GERMANIC
IRISH           GA     CELTIC
SCOTS GAELIC    GD     CELTIC
GALICIAN        GL     ROMANCE
GUARANI         GN     AMERINDIAN
GUJARATI        GU     INDIAN
HAUSA           HA     NEGRO-AFRICAN
HEBREW          HE     SEMITIC [*Changed 1989 from original ISO 639:1988, IW] 
HINDI           HI     INDIAN
CROATIAN        HR     SLAVIC
HUNGARIAN       HU     FINNO-UGRIC
ARMENIAN        HY     INDO-EUROPEAN (OTHER)
INTERLINGUA     IA     INTERNATIONAL AUX.
INTERLINGUE     IE     INTERNATIONAL AUX.
INUPIAK         IK     ESKIMO
INDONESIAN      ID     OCEANIC/INDONESIAN [*Changed 1989 from original ISO 639:1988, IN] 
ICELANDIC       IS     GERMANIC
ITALIAN         IT     ROMANCE
INUKTITUT       IU     [        ]
JAPANESE        JA     ASIAN
JAVANESE        JV     OCEANIC/INDONESIAN
GEORGIAN        KA     IBERO-CAUCASIAN
KAZAKH          KK     TURKIC/ALTAIC
GREENLANDIC     KL     ESKIMO
CAMBODIAN       KM     ASIAN
KANNADA         KN     DRAVIDIAN
KOREAN          KO     ASIAN
KASHMIRI        KS     INDIAN
KURDISH         KU     IRANIAN
KIRGHIZ         KY     TURKIC/ALTAIC
LATIN           LA     LATIN/GREEK
LINGALA         LN     NEGRO-AFRICAN
LAOTHIAN        LO     ASIAN
LITHUANIAN      LT     BALTIC
LATVIAN;LETTISH LV     BALTIC
MALAGASY        MG     OCEANIC/INDONESIAN
MAORI           MI     OCEANIC/INDONESIAN
MACEDONIAN      MK     SLAVIC
MALAYALAM       ML     DRAVIDIAN
MONGOLIAN       MN     [not given]
MOLDAVIAN       MO     ROMANCE
MARATHI         MR     INDIAN
MALAY           MS     OCEANIC/INDONESIAN
MALTESE         MT     SEMITIC
BURMESE         MY     ASIAN
NAURU           NA     [not given]
NEPALI          NE     INDIAN
DUTCH           NL     GERMANIC
NORWEGIAN       NO     GERMANIC
OCCITAN         OC     ROMANCE
AFAN (OROMO)    OM     HAMITIC
ORIYA           OR     INDIAN
PUNJABI         PA     INDIAN
POLISH          PL     SLAVIC
PASHTO;PUSHTO   PS     IRANIAN
PORTUGUESE      PT     ROMANCE
QUECHUA         QU     AMERINDIAN
RHAETO-ROMANCE  RM     ROMANCE
KURUNDI         RN     NEGRO-AFRICAN
ROMANIAN        RO     ROMANCE
RUSSIAN         RU     SLAVIC
KINYARWANDA     RW     NEGRO-AFRICAN
SANSKRIT        SA     INDIAN
SINDHI          SD     INDIAN
SANGHO          SG     NEGRO-AFRICAN
SERBO-CROATIAN  SH     SLAVIC
SINGHALESE      SI     INDIAN
SLOVAK          SK     SLAVIC
SLOVENIAN       SL     SLAVIC
SAMOAN          SM     OCEANIC/INDONESIAN
SHONA           SN     NEGRO-AFRICAN
SOMALI          SO     HAMITIC
ALBANIAN        SQ     INDO-EUROPEAN (OTHER)
SERBIAN         SR     SLAVIC
SISWATI         SS     NEGRO-AFRICAN
SESOTHO         ST     NEGRO-AFRICAN
SUNDANESE       SU     OCEANIC/INDONESIAN
SWEDISH         SV     GERMANIC
SWAHILI         SW     NEGRO-AFRICAN
TAMIL           TA     DRAVIDIAN
TELUGU          TE     DRAVIDIAN
TAJIK           TG     IRANIAN
THAI            TH     ASIAN
TIGRINYA        TI     SEMITIC
TURKMEN         TK     TURKIC/ALTAIC
TAGALOG         TL     OCEANIC/INDONESIAN
SETSWANA        TN     NEGRO-AFRICAN
TONGA           TO     OCEANIC/INDONESIAN
TURKISH         TR     TURKIC/ALTAIC
TSONGA          TS     NEGRO-AFRICAN
TATAR           TT     TURKIC/ALTAIC
TWI             TW     NEGRO-AFRICAN
UIGUR           UG     [       ]
UKRAINIAN       UK     SLAVIC
URDU            UR     INDIAN
UZBEK           UZ     TURKIC/ALTAIC
VIETNAMESE      VI     ASIAN
VOLAPUK         VO     INTERNATIONAL AUX.
WOLOF           WO     NEGRO-AFRICAN
XHOSA           XH     NEGRO-AFRICAN
YIDDISH         YI     GERMANIC [*Changed 1989 from original ISO 639:1988, JI] 
YORUBA          YO     NEGRO-AFRICAN
ZHUANG          ZA     [       ]
CHINESE         ZH     ASIAN
ZULU            ZU     NEGRO-AFRICAN
</pre>';

$doc=   new DOMDocument();
$doc->loadHTML($codes);

$xmlL = simplexml_import_dom($doc);
$pathL = $xmlL->xpath('//pre');
print_r($pathL);

?>
  • 写回答

2条回答 默认 最新

  • dongxing6802 2012-03-09 09:35
    关注

    the list is obviously generated, so you'd have better luck fixing the generator, but if you're stuck with this one list, the below should parse it the way you want:

    $langs_ar = array();
    $codes_ar = array();
    $families_ar = array();
    
    foreach(preg_split('/[
    ]+/', $codes) as $line)
    {   
        if (preg_match('/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/', $line, $matches))
        {   
            $langs_ar[] = $matches[1];
            $codes_ar[] = $matches[2];
            $families_ar[] = $matches[3];
        }                                                                                                                                             
    }
    

    Oh, and instead of 3 arrays, I'd recommend one array storing hashes for the 3 fields instead; that or make your own objects with the 3 properties lang, code, and family.

    Edit: a much shorter way to do the same is this:

    preg_match_all('/^(\S+\s*\S+)\s+(\S{2})\s+(\S.*\S)\s*$/m', $codes, $matches, PREG_SET_ORDER);
    var_dump($matches);
    

    $matches is now an array of "objects" for all your lines where indexes:

    • 0 is the full line
    • 1 is the language
    • 2 is the code
    • 3 is the family

    just iterate over that to do whatever you want.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么