scipy包的dendrogram(系统发育树、层次聚类）怎么获得每个节点的分支的两组样本名称？

# Load required modules
import pandas as pd 
import scipy.spatial
import scipy.cluster
import numpy as np
import json
import matplotlib.pyplot as plt
from functools import reduce

# Example data: gene expression
geneExp = {'genes' : ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'],
     	   'exp1': [-2.2, 5.6, 0.9, -0.23, -3, 0.1, 1.0, 3.0, 1.2, 1.3],
	   'exp2': [5.4, -0.5, 2.33, 3.1, 4.1, -3.2, -1.0, -1.2, -1.3, -1.1]
          }
df = pd.DataFrame( geneExp )

# Determine distances (default is Euclidean)
dataMatrix = np.array( df[['exp1', 'exp2']] )
distMat = scipy.spatial.distance.pdist( dataMatrix )

# Cluster hierarchicaly using scipy
clusters = scipy.cluster.hierarchy.linkage(distMat, method='single')
T = scipy.cluster.hierarchy.to_tree( clusters , rd=False )

# Create dictionary for labeling nodes by their IDs
labels = list(df.genes)
id2name = dict(enumerate(labels))

# Draw dendrogram using matplotlib to scipy-dendrogram.pdf
scipy.cluster.hierarchy.dendrogram(clusters, labels=labels, orientation='right')
plt.savefig("scipy-dendrogram.png")

# Create a nested dictionary from the ClusterNode's returned by SciPy
def add_node(node, parent ):
    # First create the new node and append it to its parent's children
    newNode = dict( node_id=node.id, children=[] )
    parent["children"].append( newNode )

    # Recursively add the current node's children
    if node.left: add_node( node.left, newNode )
    if node.right: add_node( node.right, newNode )

# Initialize nested dictionary for d3, then recursively iterate through tree
d3Dendro = dict(children=[], name="Root1")
add_node( T, d3Dendro )

根据上述代码及demo数据，可获得系统发育树及包含节点信息的字典d3Dendro如下：

demo数据的系统发育树

>>> d3Dendro

{'children': [{'children': [{'children': [{'children': [], 'name': 'b'},
      {'children': [{'children': [], 'name': 'f'},
        {'children': [{'children': [], 'name': 'h'},
          {'children': [{'children': [], 'name': 'g'},
            {'children': [{'children': [], 'name': 'i'},
              {'children': [], 'name': 'j'}],
             'name': 'i,j'}],
           'name': 'g,i,j'}],
         'name': 'g,h,i,j'}],
       'name': 'f,g,h,i,j'}],
     'name': 'b,f,g,h,i,j'},
    {'children': [{'children': [{'children': [], 'name': 'c'},
        {'children': [], 'name': 'd'}],
       'name': 'c,d'},
      {'children': [{'children': [], 'name': 'a'},
        {'children': [], 'name': 'e'}],
       'name': 'a,e'}],
     'name': 'a,c,d,e'}],
   'name': 'a,b,c,d,e,f,g,h,i,j'}],
 'name': 'Root1'}

我想请问如何根据一段python脚本，自动获取系统发育树的每个节点所对应的两组样本的名称的列表？

对于上述demo数据，目标获取的结果应该是：

[ 'a,c,d,e' , 'b,f,g,h,i,j' ], ['a,e', 'c,d'], ['a', 'e'], ['c', 'd'] ,['b', 'f,g,h,i,j' ], ['f', 'g,h,i,j' ], ['h', 'g,i,j' ], ['g', 'i,j'], ['i', 'j' ]

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
爱晚乏客游 2021-05-13 23:25
关注
写了个简单的，,可以参考下，你这个写顺序和实际上T的左右子树是相反的。我遍历出来的结果左右子树是正确的，如果你需要改，改成先遍历右子树，再遍历左子树，然后str1和str2位置换下。

tree=[] def preorder(root): if not root.is_leaf(): str1=",".join([geneExp["genes"][i] for i in root.left.pre_order()]) str2=",".join([geneExp["genes"][i] for i in root.right.pre_order()]) tree.append([str1,str2]) if root.get_left() is not None: preorder(root.get_left()) if root.get_right() is not None: preorder(root.get_right()) preorder(T) print(tree) #output #[['b,f,h,g,i,j', 'c,d,a,e'], ['b', 'f,h,g,i,j'], ['f', 'h,g,i,j'], ['h', 'g,i,j'], ['g', 'i,j'], ['i', 'j'], ['c,d', 'a,e'], ['c', 'd'], ['a', 'e']]
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

scipy包的dendrogram(系统发育树、层次聚类）怎么获得每个节点的分支的两组样本名称？ python
2021-05-13 17:07

回答 1 已采纳写了个简单的，,可以参考下，你这个写顺序和实际上T的左右子树是相反的。我遍历出来的结果左右子树是正确的，如果你需要改，改成先遍历右子树，再遍历左子树，然后str1和str2位置换下。 tree=[
python运行层次聚类Agnes算法报错 python 有问必答机器学习聚类
2022-02-11 21:14

回答 2 已采纳元组的索引越界，打印一下len(dataset)，n取值已经超过了a,b元组元素个数。
python进行层次聚类时数据读取出现问题 python 有问必答机器学习算法
2022-02-11 17:08

回答 2 已采纳改成np.power()试试： import numpy as np a=np.array([1,2,3,4]) b=np.array([2]) print(np.power(a,b))
层次聚类python，scipy（dendrogram, linkage,fcluster函数）
2022-07-26 10:23

赵孝正的博客 层次聚类python，scipy（dendrogram, linkage,fcluster函数）
Pycharm中显示无法从scipy.misc导入imread，该怎么办？(语言-python) pycharm python
2023-02-05 16:48

回答 3 已采纳这个是版本的问题，大概率是新版本的 scipy.misc 库中已经移除了 imread。可以顺着该思路尝试解决
关于python打包后，程序调用scipy库报错的问题 python
2023-04-16 00:28

回答 2 已采纳基于Monster 组和GPT的调写：这个错误提示可能是因为打包后的exe文件缺少一些必要的依赖项，可以尝试以下方法解决问题： 1确认在打包时是否包含了scipy库的所有依赖项。可以尝试在打包命令中添
关于#scipy#的问题，如何解决？(语言-python) python
2022-10-24 20:54

回答 1 已采纳请看👉 ：python的scipy库安装成功，导入scipy或其函数时却报错的问题
层次聚类python，scipy（dendrogram, linkage,fcluster函数）总算有博文说清楚层次聚类Z矩阵的意义了
2021-06-08 16:53

小乖乖的臭坏坏的博客这里，我们来解读一下scipy中给出的层次聚类scipy.cluster.hierarchy的示例： import numpy as np from scipy.cluster.hierarchy import dendrogram, linkage,fcluster from matplotlib import pyplot as plt X = [...
Python已经安装了scipy，但是就是引用不了怎么办 python 有问必答
2022-03-12 16:39

回答 2 已采纳可能有多个python?建议你看一下这里的解决办法，https://blog.finxter.com/fixed-modulenotfounderror-no-module-named-scipy/
Python的scipy里面用Levenberg–Marquardt python
2023-03-01 17:57

回答 1 已采纳基于Monster 组和GPT的调写：不完全正确。scipy.optimize.root 函数支持多种根查找算法，包括 Levenberg–Marquardt 方法，但是它不是特定为 Levenber
scipy库安装报错，如何解决？ python
2023-03-15 17:03

回答 2 已采纳参考GPT和自己的思路：从你提供的截图中可以看出，报错信息是说找不到C语言的编译器。这通常出现在安装 Python 包时需要编译一些 C 语言的模块时，但是没有安装 C 语言的编译器导致的。所以，建
聚类树图(dendrogram)绘制(matplotlib与scipy)
2022-07-30 16:34

生信小兔的博客利用scipy与matplotlib绘制树形聚类图。
关于#scipy#的问题，如何解决？ python
2023-03-30 12:44

回答 2 已采纳该回答通过自己思路及引用到GPTᴼᴾᴱᴺᴬᴵ搜索,得到内容具体如下：根据错误信息，有一部分可能是因为无法打开临时文件。你可以尝试在代码开头添加以下代码，手动指定 Scipy 的临时文件夹路径： imp
层次聚类python实现_层次聚类和随机森林（python实现）
2020-12-02 13:44

weixin_39813009的博客 层次聚类步骤：假设有N个待聚类的样本，对于层次聚类来说，基本步骤就是：1、(初始化)把每个样本归为一类，计算每两个类之间的距离，也就是样本与样本之间的相似度；2、按一定规则选取符合距离要求的类别，完成类间...
数据挖掘层次聚类python实现_数据挖掘——层次聚类（Hierarchical clustering）学习及python实现...
2020-12-17 14:49

weixin_39618275的博客文章目录一、前言二、自底向上的层次算法三、 python实现层次聚类四、使用Sklearn中的层次聚类五、使用Scipy库中的层次聚类(1). linkage(y, method=’single’, metric=’euclidean’)(2).fcluster(Z, t, criterion=...
没有解决我的问题, 去提问

悬赏问题

¥15 c程序不知道为什么得不到结果
¥40 复杂的限制性的商函数处理
¥15 程序不包含适用于入口点的静态Main方法
¥15 素材场景中光线烘焙后灯光失效
¥15 请教一下各位，为什么我这个没有实现模拟点击
¥15 执行 virtuoso 命令后，界面没有，cadence 启动不起来
¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
¥20 有关区间dp的问题求解
¥15 多电路系统共用电源的串扰问题
¥15 slam rangenet++配置

scipy包的dendrogram(系统发育树、层次聚类）怎么获得每个节点的分支的两组样本名称？

1条回答 默认 最新

悬赏问题

1条回答默认最新