dosc9472 2013-09-05 12:51
浏览 32
已采纳

从页面中提取数据

Hello I would like to create a page html and php that is able to take the data in the table contained at this link: http://www.comuni-italiani.it/province.html

I would love to have any tips, I would use the file_get_content but then I do not know how to take all the various data

  • 写回答

2条回答 默认 最新

  • douxu5845 2013-09-05 13:16
    关注

    Can you explain us more clearly what you want to exactly take from this page?

    Anyway, to do the trick, you can use file_get_contents to fetch the page then, according to what you want to take from the page (I suppose you want to take every <td> element from the page inside a table), you may use PHP regular expressions (preg_match, preg_match_all) to fetch all the data you need.

    Example for your case:

    $page = file_get_contents("http://www.comuni-italiani.it/province.html");
    
    $output = array();
    preg_match_all('/<td.*.<\/td>/',$page,$output);
    
    print_r($output);
    

    This will output:

    Array ( [0] => Array ( [0] =>    [1] => [2] => Agrigento [3] => Alessandria [4] => Ancona [5] => Aosta [6] => Arezzo [7] => Ascoli Piceno [8] => Asti [9] => Avellino [10] => Bari [11] => Barletta-Andria-Trani [12] => Belluno [13] => Benevento [14] => Bergamo [15] => Biella [16] => Bologna [17] => Bolzano [18] => Brescia [19] => Brindisi [20] => Cagliari [21] => Caltanissetta [22] => Campobasso [23] => Carbonia-Iglesias [24] => Caserta [25] => Catania [26] => Catanzaro [27] => Chieti [28] => Como [29] => Cosenza [30] => Cremona [31] => Crotone [32] => Cuneo [33] => Enna [34] => Fermo [35] => Ferrara [36] => Firenze [37] => Foggia [38] => Forlì-Cesena [39] => Frosinone [40] => Genova [41] => Gorizia [42] => Grosseto [43] => Imperia [44] => Isernia [45] => La Spezia [46] => L'Aquila [47] => Latina [48] => Lecce [49] => Lecco [50] => Livorno [51] => Lodi [52] => Lucca [53] => Macerata [54] => Mantova [55] => Massa-Carrara [56] => Matera [57] => Messina [58] => Milano [59] => Modena [60] => Monza e della Brianza [61] => Napoli [62] => Novara [63] => Nuoro [64] => Olbia-Tempio [65] => Oristano [66] => Padova [67] => Palermo [68] => Parma [69] => Pavia [70] => Perugia [71] => Pesaro e Urbino [72] => Pescara [73] => Piacenza [74] => Pisa [75] => Pistoia [76] => Pordenone [77] => Potenza [78] => Prato [79] => Ragusa [80] => Ravenna [81] => Reggio Calabria [82] => Reggio Emilia [83] => Rieti [84] => Rimini [85] => Roma [86] => Rovigo [87] => Salerno [88] => Medio Campidano [89] => Sassari [90] => Savona [91] => Siena [92] => Siracusa [93] => Sondrio [94] => Taranto [95] => Teramo [96] => Terni [97] => Torino [98] => Ogliastra [99] => Trapani [100] => Trento [101] => Treviso [102] => Trieste [103] => Udine [104] => Varese [105] => Venezia [106] => Verbano-Cusio-Ossola [107] => Vercelli [108] => Verona [109] => Vibo Valentia [110] => Vicenza [111] => Viterbo [112] => CercaNel Sito e sul WebPagine UtiliElenco Province per PopolazionePrincipali Città ItalianeLista Alfabetica RegioniAmministrazioni LocaliScuole in Italia [113] =>   ) )
    

    which can, of course, be filtered.

    In your case, for example, by adding a little foreach loop... :

    $page = file_get_contents("http://www.comuni-italiani.it/province.html");
    
        $output = array();
        preg_match_all('/<td.*.<\/td>/',$page,$output);
    
        $provinces = array();
    
        foreach ($output as $id => $list) {
            for ($i = 2; $i <= 111; $i++) {
                array_push($provinces,$list[$i]);
            }
        }
    
        print_r($provinces);
    

    Will give you this:

    Array ( [0] => Agrigento [1] => Alessandria [2] => Ancona [3] => Aosta [4] => Arezzo [5] => Ascoli Piceno [6] => Asti [7] => Avellino [8] => Bari [9] => Barletta-Andria-Trani [10] => Belluno [11] => Benevento [12] => Bergamo [13] => Biella [14] => Bologna [15] => Bolzano [16] => Brescia [17] => Brindisi [18] => Cagliari [19] => Caltanissetta [20] => Campobasso [21] => Carbonia-Iglesias [22] => Caserta [23] => Catania [24] => Catanzaro [25] => Chieti [26] => Como [27] => Cosenza [28] => Cremona [29] => Crotone [30] => Cuneo [31] => Enna [32] => Fermo [33] => Ferrara [34] => Firenze [35] => Foggia [36] => Forlì-Cesena [37] => Frosinone [38] => Genova [39] => Gorizia [40] => Grosseto [41] => Imperia [42] => Isernia [43] => La Spezia [44] => L'Aquila [45] => Latina [46] => Lecce [47] => Lecco [48] => Livorno [49] => Lodi [50] => Lucca [51] => Macerata [52] => Mantova [53] => Massa-Carrara [54] => Matera [55] => Messina [56] => Milano [57] => Modena [58] => Monza e della Brianza [59] => Napoli [60] => Novara [61] => Nuoro [62] => Olbia-Tempio [63] => Oristano [64] => Padova [65] => Palermo [66] => Parma [67] => Pavia [68] => Perugia [69] => Pesaro e Urbino [70] => Pescara [71] => Piacenza [72] => Pisa [73] => Pistoia [74] => Pordenone [75] => Potenza [76] => Prato [77] => Ragusa [78] => Ravenna [79] => Reggio Calabria [80] => Reggio Emilia [81] => Rieti [82] => Rimini [83] => Roma [84] => Rovigo [85] => Salerno [86] => Medio Campidano [87] => Sassari [88] => Savona [89] => Siena [90] => Siracusa [91] => Sondrio [92] => Taranto [93] => Teramo [94] => Terni [95] => Torino [96] => Ogliastra [97] => Trapani [98] => Trento [99] => Treviso [100] => Trieste [101] => Udine [102] => Varese [103] => Venezia [104] => Verbano-Cusio-Ossola [105] => Vercelli [106] => Verona [107] => Vibo Valentia [108] => Vicenza [109] => Viterbo )
    

    (Sorry for the huge arrays).

    It is, however, keeping the links inside the array so, if you want to take the values only and NOT the anchor associated to it, just feel free to use another regular expression.

    Hope this helps.

    (take this as an example, keep in mind that this foreach trick may not work anymore if the page gets changed, I posted it just to give you an idea on how you may have solved that case).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100
  • ¥15 关于#hadoop#的问题
  • ¥15 (标签-Python|关键词-socket)
  • ¥15 keil里为什么main.c定义的函数在it.c调用不了
  • ¥50 切换TabTip键盘的输入法