Gustab.M2 2021-12-29 15:23 采纳率: 100%
浏览 86
已结题

Pandas读取文件后列名无法对应

使用cosb(禁止词汇)ench进行测试的时候,生成的测试结果为csv文件 ,本来打算用pandas进行一下数据分析,却在第一步就卡住了。虽然现在使用读取普通文件的方式暂时解决了问题,但是pandas中遇到的问题却还没有解决,所以在这里请教一下各位朋友。

有一个文件,文件名为w108-8K-80%Read20%Write-160Thread.csv,共有7行,内容如下:

Stage,Op-Name,Op-Type,Op-Count,Byte-Count,Avg-ResTime,Avg-ProcTime,60%-ResTime,80%-ResTime,90%-ResTime,95%-ResTime,99%-ResTime,100%-ResTime,Throughput,Bandwidth,Succ-Ratio,Status,Detailed Status
s1-init,init-write,init,0,0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0,0,N/A,completed,waiting @ 2021-12-28 17:50:53,booting @ 2021-12-28 17:50:53,submitting @ 2021-12-28 17:50:53,authing @ 2021-12-28 17:50:53,launching @ 2021-12-28 17:50:53,running @ 2021-12-28 17:50:53,closing @ 2021-12-28 17:50:58,completed @ 2021-12-28 17:50:58
s2-prepare,prepare-write,prepare,100000,800000000,18.4,18.34,20,30,30,40,110,440,8870.62,70964947.96,100%,completed,waiting @ 2021-12-28 17:50:53,booting @ 2021-12-28 17:51:00,submitting @ 2021-12-28 17:50:59,authing @ 2021-12-28 17:50:59,launching @ 2021-12-28 17:50:59,running @ 2021-12-28 17:50:59,closing @ 2021-12-28 17:51:14,completed @ 2021-12-28 17:51:14
s3-main,read,read,15249196,121993568000,8.96,8.57,10,20,20,30,40,670,8471.9,67775177.22,100%,completed,waiting @ 2021-12-28 17:50:53,booting @ 2021-12-28 17:51:17,submitting @ 2021-12-28 17:51:17,authing @ 2021-12-28 17:51:17,launching @ 2021-12-28 17:51:17,running @ 2021-12-28 17:51:17,closing @ 2021-12-28 18:21:19,completed @ 2021-12-28 18:21:19
s3-main,write,write,3812554,30500432000,39.58,39.53,20,40,150,180,270,1430,2118.12,16944927.76,100%,completed,waiting @ 2021-12-28 17:50:53,booting @ 2021-12-28 17:51:17,submitting @ 2021-12-28 17:51:17,authing @ 2021-12-28 17:51:17,launching @ 2021-12-28 17:51:17,running @ 2021-12-28 17:51:17,closing @ 2021-12-28 18:21:19,completed @ 2021-12-28 18:21:19
s4-cleanup,cleanup-delete,cleanup,200000,0,31.32,31.32,20,40,70,170,260,830,5126.7,0,100%,completed,waiting @ 2021-12-28 17:50:53,booting @ 2021-12-28 18:21:20,submitting @ 2021-12-28 18:21:19,authing @ 2021-12-28 18:21:19,launching @ 2021-12-28 18:21:19,running @ 2021-12-28 18:21:19,closing @ 2021-12-28 18:22:04,completed @ 2021-12-28 18:22:05
s5-dispose,dispose-delete,dispose,0,0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0,0,N/A,completed,waiting @ 2021-12-28 17:50:53,booting @ 2021-12-28 18:22:07,submitting @ 2021-12-28 18:22:07,authing @ 2021-12-28 18:22:07,launching @ 2021-12-28 18:22:07,running @ 2021-12-28 18:22:07,closing @ 2021-12-28 18:22:17,completed @ 2021-12-28 18:22:17

使用pandas进行读取

import pandas as pd
import numpy as np
import os

fpath = "E:\Python\AdvancedPython\data\w108-8K-80%Read20%Write-160Thread.csv"
w108 = pd.read_csv(fpath, sep=",", header=0)
print(w108.head())
columns = w108.columns  # 获取列名
print("="*20)
print(columns)
print("="*20)
print(w108['Stage'])

返回的数据很奇怪

E:\Python\AdvancedPython\venv\Scripts\python.exe E:/Python/AdvancedPython/test1.py
                                                                     Stage  ...                  Detailed Status
s1-init    init-write     init    0        0            NaN   NaN      NaN  ...  completed @ 2021-12-28 17:50:58
s2-prepare prepare-write  prepare 100000   800000000    18.40 18.34   20.0  ...  completed @ 2021-12-28 17:51:14
s3-main    read           read    15249196 121993568000 8.96  8.57    10.0  ...  completed @ 2021-12-28 18:21:19
           write          write   3812554  30500432000  39.58 39.53   20.0  ...  completed @ 2021-12-28 18:21:19
s4-cleanup cleanup-delete cleanup 200000   0            31.32 31.32   20.0  ...  completed @ 2021-12-28 18:22:05

[5 rows x 18 columns]
====================
Index(['Stage', 'Op-Name', 'Op-Type', 'Op-Count', 'Byte-Count', 'Avg-ResTime',
       'Avg-ProcTime', '60%-ResTime', '80%-ResTime', '90%-ResTime',
       '95%-ResTime', '99%-ResTime', '100%-ResTime', 'Throughput', 'Bandwidth',
       'Succ-Ratio', 'Status', 'Detailed Status'],
      dtype='object')
====================
s1-init     init-write      init     0         0             NaN    NaN       NaN
s2-prepare  prepare-write   prepare  100000    800000000     18.40  18.34    20.0
s3-main     read            read     15249196  121993568000  8.96   8.57     10.0
            write           write    3812554   30500432000   39.58  39.53    20.0
s4-cleanup  cleanup-delete  cleanup  200000    0             31.32  31.32    20.0
s5-dispose  dispose-delete  dispose  0         0             NaN    NaN       NaN
Name: Stage, dtype: float64

Process finished with exit code 0

按理说,以第一行作为列名,那Stage对应的数据就应该是s1-init、s2-prepare、s3-main这些内容,但实际返回的内容确实乱七八糟,毫无头绪,希望懂pandas的同学们能够帮我解惑。

  • 写回答

2条回答 默认 最新

  • bekote 2021-12-29 15:56
    关注

    用逗号分隔读取列,表头数了一下18个,但是行不止18个,可能是某一列里的内容包括逗号,你可以先处理下文件,把含有逗号的内容加上双引号,或者读入后再合并处理

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论 编辑记录
查看更多回答(1条)

报告相同问题?

问题事件

  • 系统已结题 1月6日
  • 已采纳回答 12月29日
  • 创建了问题 12月29日

悬赏问题

  • ¥20 机器学习能否像多层线性模型一样处理嵌套数据
  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效