douping3860 2018-06-27 13:40
浏览 32

如何将此文本转换为所需的数组格式并在csv中导出?

I have this text extracted from pdf using pdftotext tool

Please find below text structure

stage    title1    title2  title3  title4
I        value1    value2  value3  
II                         value5  value6

stage    Other1      Other2     Other3     Other4
I        otherval1   otherval2  otherval3  otherval4

Now I want to export this text in CSV format using appropriate columns and headers this way or build an array this way

[
  "category" => "title1",
  "score"    => "value1",
],
[
  "category" => "title2",
  "score"    => "value2",
],
[
  "category" => "title3",
  "score"    => "value3"
],
// unable to to do this
[
  "category" => "title3",
  "score"    => "value5"
],
[
  "category" => "title4",
  "score"    => "value6",
],

.
.
// so on

Now the problem is

  • Column values in I stage and II stage are optional, but any one of the rows will contain at least one value for each column
  • Stage II row is optional, may exist or not
  • If stage II row exists then at least one column value exists in the row

The problem I am facing is how can I map

  • value5 to title3
  • value6 to titl4

Here is my parser code (PHP)

$rows = explode("
", $pdfExtractedText);
$rows = array_values(array_filter($rows));

$categories = array_values(array_filter(explode(" ", $rows[7])));
$stage1Scores = array_values(array_filter(explode(" ", $rows[8])));
$stage2Scores = array_values(array_filter(explode(" ", $rows[9])));
var_dump($categories);
var_dump($stage1Scores);
var_dump($stage2Scores);

OUTPUT:

// categories
array:13 [
  0 => "stage"
  1 => "title1"
  2 => "title2"
  3 => "title3"
  4 => "title4"
]

//values - Index preserved so that I can map with categories
array:14 [
  0 => "I"
  1 => "value1"
  2 => "value2"
  3 => "value3"
  4 => "value4"
]

// index not preserved :(
array:2 [
  0 => "II"
  1 => "value5",
  2 => "value6"
]
  • 写回答

1条回答 默认 最新

  • duanbei3747 2018-06-27 13:50
    关注

    Then try this,

    $csv = "";
    
    $csv .= implode("," , $categories) . PHP_EOL; 
    $csv .= implode("," , $stage1scores) . PHP_EOL;
    $csv .= implode("," , $stage2scores) . PHP_EOL;
    

    Then write it to a file.

    评论

报告相同问题?

悬赏问题

  • ¥15 用hfss做微带贴片阵列天线的时候分析设置有问题
  • ¥50 我撰写的python爬虫爬不了 要爬的网址有反爬机制
  • ¥15 Centos / PETSc / PETGEM
  • ¥15 centos7.9 IPv6端口telnet和端口监控问题
  • ¥120 计算机网络的新校区组网设计
  • ¥20 完全没有学习过GAN,看了CSDN的一篇文章,里面有代码但是完全不知道如何操作
  • ¥15 使用ue5插件narrative时如何切换关卡也保存叙事任务记录
  • ¥20 海浪数据 南海地区海况数据,波浪数据
  • ¥20 软件测试决策法疑问求解答
  • ¥15 win11 23H2删除推荐的项目,支持注册表等