从属性列表创建矩阵

I have a CSV with a list of items, and each has a series of attributes attached:

"5","coffee|peaty|sweet|cereal|cream|barley|malt|creosote|sherry|sherry|manuka|honey|peaty|peppercorn|chipotle|chilli|salt|caramel|coffee|demerara|sugar|molasses|spicy|peaty"
"6","oil|lemon|apple|butter|toffee|treacle|sweet|cola|oak|cereal|cinnamon|salt|toffee"

"5" and "6" are both item IDs and unique in the file.

Ultimately, I want to create a matrix demonstrating how many times in the document each attribute was mentioned in the same row with every other attribute. E.g.:

        peaty    sweet    cereal    cream    barley ...
coffee    1       2         2         1        1
oil       0       1         0         0        0

Note that I'd prefer to reduce duplicates: i.e., "peaty" isn't both a column and a row.

The original database is essentially a key-value store (A table with columns "itemId" and "value") -- I can reformat the data if it helps.

Any idea how I'd do this with Python, PHP or Ruby (Whichever is easiest)? I get the feeling Python can probably do this the easiest of the bunch but I'm missing something fairly basic and/or crucial (I'm just starting to do data analysis with Python).

Thanks!

Edit: In response to the (somewhat unhelpful) "What have you tried" comment, here's what I'm currently working with (Don't laugh, my Python is terrible):

#!/usr/bin/python
import csv

matrix = {}

with open("field.csv", "rb") as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        attribs = row[1].split("|")
        for attrib in attribs:
            if attrib not in matrix:
                matrix[attrib] = {}
            for attrib2 in attribs:
                if attrib2 in matrix[attrib]:
                    matrix[attrib][attrib2] = matrix[attrib][attrib2] + 1 
                else:
                    matrix[attrib][attrib2] = 1
print matrix

The output is a big, unsorted dictionary of terms, likely with a lot of duplication between the rows and columns. If I use pandas and replace the "print matrix" line with the following...

from pandas import *
df = DataFrame(matrix).T.fillna(0)
print df

I get:

<class 'pandas.core.frame.DataFrame'>
Index: 195 entries, acacia to zesty
Columns: 195 entries, acacia to zesty
dtypes: float64(195)

...Which leads me to think I'm doing something rather wrong.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duanfang2708 2013-05-28 16:43
关注
I'd do this with an undirected graph, where the frequency is the edge weight. Then you can generate the matrix quite easily by looping through each vertex, where each edge weight represents how many times each element occurred with another.

Graph docs: http://networkx.github.io/documentation/latest/reference/classes.graph.html

Starter code:

import csv import itertools import networkx as nx G = nx.Graph() reader = csv.reader(open('field.csv', "rb")) for row in reader: row_elements = row[1].split("|") combinations = itertools.combinations(row_elements, 2) for (a, b) in combinations: if G.has_edge(a, b): G[a][b]['weight'] += 1 else: G.add_edge(a, b, weight=1) print(G.edges(data=True))

Edit: woah see if this does everything for ya http://networkx.github.io/documentation/latest/reference/linalg.html#module-networkx.linalg.graphmatrix
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

PHP - 从动态变量访问对象属性 php
2017-02-16 06:16

回答 3 已采纳 I don't think you can make multiple dereferences this way. You'll be looking for a variable in $ob
java创建父类，将属性封装 java
2022-07-18 10:01

回答 1 已采纳 import java.util.*; public class Solution { public static void main(String[] args) { Pe
PHP DOMDocument：从id获取属性值 php
2017-10-10 14:52

回答 2 已采纳 Try this : $data->getAttribute('value'); PHP: DomElement->getAttribute $attrs = array()
用数组循环实现矩阵乘法php,array用法 numpy_从创建数组到矩阵运算，一文带你看懂Numpy...
2021-05-07 08:51

weixin_39998521的博客导读：Numpy(Numerical Python的简称)是高性能科学计算和数据分析的基础包，其提供了矩阵运算的功能。本文带你了解Numpy的一些核心知识点。作者：魏溪含涂铭张修鹏如需转载请联系华章科技Numpy提供的主要功能具体...
关于实体类属性创建的问题 java 后端有问必答
2021-08-19 15:52

回答 3 已采纳对象的概念就是这样的。你不能把业务和对象混合在一起。vo可以理解为一个中间桥梁，符合开发的规范性。VO：值对象(Value Object)，通常用于业务层之间的数据传递，和Entity一样也是仅仅包含
使用find从html获取tag属性 php
2019-07-12 18:38

回答 1 已采纳 You need to use... $normalSrc = $ul->{'normal-src'};
PHP 类的创建和实例求过做业 php
2021-12-24 13:12

回答 1 已采纳 <?php class Pen{ private $name; private $color; private $price; public function __constr
android创建画布,创建自定义视图在Android矩阵效果画布教程
2021-06-07 16:04

要努力变得更好的博客介绍下面是一个快速教程，教你如何在Android中创建自定义视图。自定义视图创建一个矩阵雨效果。本教程发布在http://www.androidlearner.com/。背景下面是关于如何工作的小背景: 自定义视图 View是表示用户界面...
PHP防止创建未在类中定义的属性 php
2013-09-04 18:54

回答 1 已采纳 To your class add: public function __get($name) { return NULL; } public function __set($name, $v
创建单选按钮中各种属性的用法？ html html5
2022-09-30 10:34

回答 1 已采纳这个id是为了跟label关联起来。label有一个for属性，标签的 for 属性应当与相关元素的 id 属性相同。
easyui 列表另一种写法的列表如何添加属性 php
2020-04-29 16:11

回答 1 已采纳 ``` { field : 'pid', title : '名称', width : 80, formatter : function(value, row, index) { ret
php artisan命令表,php artisan 命令列表
2021-03-24 05:53

徐先生的猫的博客 php artisan 命令列表命令获取上面的翻译内容命令说明备注php artisan make:resource ?创建api返回格式化资源>=5.4版本可用php artisan make:rule ?创建validate规则>=5.4版本可用php artisan make:exception ...
如何从Laravel中的对象获取属性 laravel php
2019-05-31 13:40

回答 3 已采纳 As I can see, you are using Geocoder library. So, NominatimAddress extends Address, which have fol
PYTHON用[邻接列表]及[邻接矩阵]来存储无向图
2024-04-24 02:10

铁松溜达py的博客 # self.adj_matrix = [[0] * num_nodes for _ in range(num_nodes)]: 这一行创建了一个大小为 num_nodes × num_nodes 的二维列表（即矩阵），并将其用于表示社交网络的邻接矩阵。初始时，所有的边都被设置为 0，...
vz_regulator.ee_addon:实现用于正则表达式输入验证的 HTML5 模式属性
2021-07-14 16:33

如果您需要创建自己的模式，我推荐使用进行快速简单的正则表达式测试。兼容性与 Safecracker、网格、矩阵和低变量兼容需要 EE >= 2.6 和 jQuery 模块。格式提示不会在 IE 7 及以下版本中显示。安装下载并解压...
没有解决我的问题, 去提问

悬赏问题

¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
¥20 BAPI_PR_CHANGE how to add account assignment information for service line
¥500 火焰左右视图、视差（基于双目相机）
¥100 set_link_state
¥15 虚幻5 UE美术毛发渲染
¥15 CVRP 图论物流运输优化
¥15 Tableau online 嵌入ppt失败
¥100 支付宝网页转账系统不识别账号
¥15 基于单片机的靶位控制系统
¥15 真我手机蓝牙传输进度消息被关闭了，怎么打开？(关键词-消息通知)

从属性列表创建矩阵

2条回答 默认 最新

悬赏问题

2条回答默认最新