7.28（周日）中午之前要：如何使用Matlab或python或其他语言解决机器学习中KNN与GMM的问题？

图片说明 related code

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

3条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
不在今天 2019-08-02 02:49
关注
1 Data Preparation
pareto displays the first 95% of the cumulative distribution, some elements in y are not displayed.
the smallest number of principal components is 3

2 Unsupervised Clustering on Colorbird Image with Mean Shift
How does your choice of bandw effect the final number of clusters?

Inappropriate bandwidth can cause modes to be merged. And as the picture shown, as the bandwidth size increase, the number of clusters will decrease.
Comment on the reproducibility of these results versus new random selection of vectors
Try two times use different new random selection of vectors
Try 1(Number of clusters) :
Try 2(Number of clusters) :
As we can see from the table, the number of clusters will slightly different, but the Overall trend is the same.
The smallest number of principal components needed to produce a reasonable result is 2.
3 Colorbird Image Segmentation with Unsupervised Clustering: KNN

4 Colorbird Image Segmentation with Unsupervised Clustering: GMM
k-mean segmentation is simpler
segmentation result is not very good compared

with GMM
We can infer it from the tree branch that if some pixels are white label, the surrounding pixels will also become the same label.
GMM segmentation is based on probability, so when we see the tree branch, it will not lead to the same situation.
CODE
1:
% this script imports the color bird image
clear
%
data=imread('42049_colorBird.jpg'); figure(1),subplot(231),imshow(data),title('42049_colorBird.jpg')
% form the raw features
nc= size(data,2);% column size of image
nr= size(data,1);% row size of image featureSize= nc*nr;%numberoffeaturevectors
%numSamples = round(featureSize*randSampFrac); % random sample size
raw_feature=zeros(featureSize,5); idx=0;
for rowcount=1:nr
for colcount=1:nc
idx=idx+1;
raw_feature(idx,:)=[rowcount colcount double(data(rowcount,colcount,1)) double(data(rowcount,colcount,2)) double(data(rowcount,colcount,3))];
end end

% normalize the feature vectors
[feature,mu,sigma]=zscore(raw_feature);
% find the principal components of the features
[coeff,score,latent] = pca(feature);
%plot pareto
fig2 = figure(2);
pareto(latent)
Itransformed = feature*coeff;
%Plot approximations images
Ipc1 = reshape(Itransformed(:,1),size(data,1),size(data,2)); Ipc2 = reshape(Itransformed(:,2),size(data,1),size(data,2)); Ipc3 = reshape(Itransformed(:,3),size(data,1),size(data,2)); Ipc4 = reshape(Itransformed(:,4),size(data,1),size(data,2)); Ipc5 = reshape(Itransformed(:,5),size(data,1),size(data,2)); figure(1),subplot(232),imshow(Ipc1),title('pc1') figure(1),subplot(233),imshow(Ipc2),title('pc2') figure(1),subplot(234),imshow(Ipc3),title('pc3') figure(1),subplot(235),imshow(Ipc4),title('pc4') figure(1),subplot(236),imshow(Ipc5),title('pc5')
%randomly select 5% of the feature vectors
threeC = Itransformed(:,1:3);
rand_index = randperm(size(threeC,1));
random_part_index = rand_index(round(1:size(threeC,1)*0.05)); random_part = threeC(random_part_index,:);

%mean shift
i=1;
for bandwidth=0.1:0.05:2;
x(i)=bandwidth;
random_part = transpose(random_part);
clustCent = HGMeanShiftCluster(random_part,bandwidth,'gaussian'); Ncluster=size(clustCent,2);
y(i)=Ncluster;
i=i+1;
end
%plot
fig3 = figure(3); plot(x(1:2:40),y(1:2:40))
mean shift:
function [shiftedClusterCenter] = mean_shift(clusterCenter,bandw,weight)
% this computes one iteration of the mean shift algorithm
% using a Gaussian kernel with standard deviation given by 'bandw' shiftedClusterCenter=clusterCenter;
% form averaging distributions, one per cluster center, in columns of 'distMatrix'
distMatrix = exp(-pdist2(clusterCenter,clusterCenter)/(2*bandw^2)); distMatrix = (weight*ones(1,size(distMatrix,2))).*distMatrix; normalization = sum(distMatrix,1);
distMatrix = distMatrix ./ (ones(size(distMatrix,1),1)*normalization); for count = 1:size(shiftedClusterCenter,1)
shiftedClusterCenter(count,:) = sum(distMatrix(:,count)*ones(1,size(clusterCenter,2)).*clusterCenter,

1);
end
end
gauss function:
function out = gaussfun(x,d,bandWidth) % approximate Gaussian kernel
% x - value
% d - location
% bandWidth - band width of the kernel
% out - contribtion to the kernel mean %
% Copyright 2015 Han Gong, University of East Anglia
ns = 1000; % resolution of guassian approximation xs = linspace(0,bandWidth,ns+1); % approximate ticks kfun = exp(-(xs.^2)/(2*bandWidth^2));
w = kfun(round(d/bandWidth*ns)+1);
w = w/sum(w); % normalise
out = sum( bsxfun(@times, x, w ), 2 );
end
HGmeanshiftcluster:
function [clustCent,data2cluster,cluster2dataCell] = HGMeanShiftCluster(dataPts,bandWidth,kernel,plotFlag); %HGMEANSHIFTCLUSTER performs MeanShift Clustering of data using a chosen kernel
%
% ---INPUT---
% dataPts
% bandWidth
% kernel

input data, (numDim x numPts)

is bandwidth parameter (scalar)

kernel type (flat or gaussian)

% plotFlag
% ---OUTPUT---
% clustCent
numClust)
% data2cluster
(numPts)
% cluster2dataCell - for every cluster which points are in it (numClust)
%
% Copyright 2015 Han Gong, University of East Anglia
% Copyright 2006 Bart Finkston %
% MeanShift first appears in
% K. Funkunaga and L.D. Hosteler, "The Estimation of the Gradient of a
% Density Function, with Applications in Pattern Recognition"
if nargin < 2
error('no bandwidth specified')
end
if nargin < 4
plotFlag = true;
plotFlag = false;
end
%**** Initialize stuff ***
[numDim,numPts] = size(dataPts); numClust = 0;
bandSq = bandWidth^2; initPtInds = 1:numPts;
maxPos = max(dataPts,[],2); % biggest size in each dimension minPos = min(dataPts,[],2); % smallest size in each dimension

display output if 2 or 3 D (logical)

is locations of cluster centers (numDim x

for every data point which cluster it belongs to

boundBox = maxPos-minPos; % bounding box size
sizeSpace = norm(boundBox); % indicator of size of data space stopThresh = 1e-3*bandWidth; % when mean has converged
clustCent = []; % center of clust
beenVisited= false(1,numPts); % track if a points been seen already numInitPts = numPts; % number of points to posibaly use as initilization points
clusterVotes = zeros(1,numPts,'uint16'); % used to resolve conflicts on cluster membership
clustMembsCell = [];
%*** mean function with the chosen kernel ****
switch kernel
case 'flat' % flat kernel
kmean = @(x,dis) mean(x,2);
case 'gaussian' % approximated gaussian kernel kmean = @(x,d) gaussfun(x,d,bandWidth);
otherwise
error('unknown kernel type'); end
while numInitPts
tempInd = ceil( (numInitPts-1e-6)*rand); % pick a random seed
point
stInd = initPtInds(tempInd); % use this point as start of mean myMean = dataPts(:,stInd); % intilize mean to this points
location
myMembers = []; % points that will get added to this cluster thisClusterVotes = zeros(1,numPts,'uint16'); % used to resolve
conflicts on cluster membership
while true %loop untill convergence
sqDistToAll = sum(bsxfun(@minus,myMean,dataPts).^2); % dist

squared from mean to all points still active
inInds = find(sqDistToAll < bandSq); % points within bandWidth
thisClusterVotes(inInds) = thisClusterVotes(inInds)+1; % add a vote for all the in points belonging to this cluster
myOldMean = myMean; % save the old mean
myMean = kmean(dataPts(:,inInds),sqrt(sqDistToAll(inInds))); % compute the new mean
myMembers = [myMembers inInds]; % add any point within bandWidth to the cluster
beenVisited(myMembers) = true; % mark that these points have been visited
%*** plot stuff ****
if plotFlag figure(12345),clf,hold on if numDim == 2
plot(dataPts(1,:),dataPts(2,:),'.') plot(dataPts(1,myMembers),dataPts(2,myMembers),'ys') plot(myMean(1),myMean(2),'go') plot(myOldMean(1),myOldMean(2),'rd')
pause(0.1);
end end
%**** if mean doesn't move much stop this cluster ***
if norm(myMean-myOldMean) < stopThresh %check for merge posibilities mergeWith = 0;
for cN = 1:numClust
distToOther = norm(myMean-clustCent(:,cN)); % distance to old clust max

if distToOther < bandWidth/2 % if its within bandwidth/2 merge new and old
mergeWith = cN;
break; end
end
if mergeWith > 0 % something to merge
nc = numel(myMembers); % current cluster's member
no = numel(clustMembsCell{mergeWith}); % old cluster's
nw = [nc;no]/(nc+no); % weights for merging mean
number
member number
clustMembsCell{mergeWith} = unique([clustMembsCell{mergeWith},myMembers]); %recordwhichpoints inside
clustCent(:,mergeWith) = myMean*nw(1) + myOldMean*nw(2);
clusterVotes(mergeWith,:) = clusterVotes(mergeWith,:) + thisClusterVotes; %add these votes to the merged cluster
else % it's a new cluster
numClust = numClust+1; %increment clusters clustCent(:,numClust) = myMean; %record the mean clustMembsCell{numClust} = myMembers; %store my members clusterVotes(numClust,:) = thisClusterVotes; % creates
a new vote
end
break; end
end

initPtInds = find(~beenVisited); % we can initialize with any of the points not yet visited
numInitPts = length(initPtInds); %number of active points in set end
[~,data2cluster] = max(clusterVotes,[],1); % a point belongs to the cluster with the most votes
%*** If they want the cluster2data cell find it for them
if nargout > 2
cluster2dataCell = cell(numClust,1); for cN = 1:numClust
myMembers = find(data2cluster == cN);
cluster2dataCell{cN} = myMembers;
end end
end
KNN:
I = double(imread('42049_colorBird.jpg')); %lab_I = rgb2lab(I);
ab = I(:,:,1:3);
ab = im2single(ab);
%K=2,3,4,5
K=2;
% repeat the clustering 3 times to avoid local minima pixel_labels = imsegkmeans(ab,K,'NumAttempts',3); imshow(pixel_labels,[])
title('Image Labeled by Cluster 2');

GMM:
clear; clc;
[X1,X2]=generateData();
K=3;
[a_init,mu_init,sigma_init]=initPara(X1); [a_GMM,mu_GMM,sigma_GMM]=GMM(X1,a_init,mu_init,sigma_init); [centroids,~] = Kmeans( X1,mu_init,K);
disp('kmeans1')
disp(centroids);
disp('mean-gaussian1');
disp('nmean');
disp(mu_GMM);
disp('matri')
disp(sigma_GMM);
disp(['mixn',num2str(a_GMM)]); [a_Kea_GMM,mu_Kea_GMM,sigma_Kea_GMM]=GMM(X1,a_init,centroids,sigma_in it);
disp('mean-gau2');
disp('meani');
disp(mu_Kea_GMM);
disp('nmatr')
disp(sigma_Kea_GMM);
disp(['mixn',num2str(a_Kea_GMM)]);
[a_init,mu_init,sigma_init]=initPara(X2); [a_fin,mu_fin,sigma_GMM]=GMM(X2,a_init,mu_init,sigma_init); [centroids,~] = Kmeans( X2,mu_init,K);

disp('meanx21')
disp(centroids);
disp('GMMg2');
disp('mean2');
disp(mu_GMM);
disp('mat')
disp(sigma_GMM);
disp(['matn2',num2str(a_GMM)]); [a_Kea_GMM,mu_Kea_GMM,sigma_Kea_GMM]=GMM(X2,a_init,centroids,sigma_in it);
disp('kmeansco2');
disp('mean2a');
disp(mu_Kea_GMM);
disp('matco2') disp(sigma_Kea_GMM); disp(['matn2',num2str(a_Kea_GMM)]);
function [X1,X2]=generateData()
mu=[1,1;4,4;8,1]; sigma=[2,0;0,2];
a1=[1/3,1/3,1/3]; a2=[0.6,0.3,0.1];
N=1000; rand1=randsrc(N,1,[[1,2,3];a1]); rand2=randsrc(N,1,[[1,2,3];a2]); X1=[];
X2=[];
mean1=[];
for i=1:size(a1,2)

X1_temp=mvnrnd(mu(i,:),sigma,length(find(rand1==i))); X1=[X1;X1_temp];
subplot(1,2,1);
plot(X1_temp(:,1),X1_temp(:,2),'+');
title('X1');
legend('m1','m2','m3');
xlabel('x');ylabel('y');
hold on;
mean1=[mean1;mean(X1_temp)]; cov1(:,:,i)=cov(X1_temp); X2_temp=mvnrnd(mu(i,:),sigma,length(find(rand2==i))); X2=[X2;X2_temp];
subplot(1,2,2); plot(X2_temp(:,1),X2_temp(:,2),'*'); title('X2'); legend('m1','m2','m3'); xlabel('x');ylabel('y');
hold on;
end
disp(['mean-ar1']); disp(mean1); disp(['mean-ar2']); disp(cov1);
end
function [ a,mu,sigma ] = initPara(X)
[m,n]=size(X);
r=randperm(m);
mu=X(r(1:3),:);
a=[1/3,1/3,1/3];
sigma=[1,0;0,1];
end

function [a_fin,mu_fin,sigma_fin]=GMM(X1,a_init,mu_init,sigma_init)
K=size(a_init,2); [M,N]=size(X1);
a=a_init; mu=mu_init; sigma=sigma_init; px=zeros(M,K); thre=1e-5; LLD_pro=inf; while true
for i=1:K px(:,i)=mvnpdf(X1,mu(i,:),sigma);
end
pGramm=repmat(a,M,1).*px; pGramm=pGramm./repmat(sum(pGramm,2),1,K); Nk=sum(pGramm,1); %1*K mu=diag(1./Nk)*pGramm'*X1;
for kk=1:K
Xshift=X1-repmat(mu(kk,:),M,1);
sigma(:,:,kk)=(Xshift'*diag(pGramm(:,kk))*Xshift)/Nk(:,kk);
end
a=Nk/M; LLD=sum(log(a*px')); if (LLD-LLD_pro)<thre
break; end
LLD_pro=LLD;

end
a_fin=a;
mu_fin=mu;
sigma_fin=sigma;
end
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(2条)

报告相同问题？

关注问题

最近在用Python的HTTP协议，get不到服务器的数据，用postman是可以get到服务器数据的，搞不懂为啥 http python 有问必答
2021-09-02 17:24

回答 3 已采纳可以尝试在get中添加 headers={'User-Agent':'Mozilla/5.0"},指定一下默认请求头，规避python-requests请求头。
clickhouse部署问题大数据数据分析数据库
2022-11-24 16:50

回答 1 已采纳给你找了一篇非常好的博客，你可以看看是否有帮助，链接：clickhouse集群部署步骤（包括部分问题解决方法）
matplotlib绘图时报错(股市相关ohlc的绘制) python
2019-02-04 00:07

回答 2 已采纳 subplot2 = plt.subplot2grid((2,1),(1,0),rowspan=1,colspan=1,sharex=subplot1) #candlestick2
无人机开发编程python_CodinGame 2020开发人员调查称Python是最受欢迎的编程语言
2020-06-13 20:45

diluan6799的博客首先，让我们回答这个紧迫的问题：什么是最喜欢的语言，什么是最可怕的编程语言？最喜欢的编程语言 这不是秘密，开发人员喜欢Python！在调查中，有35.97％的受访者说这是他们最喜欢的编程语言。还请参见： ...
多嵌套json字符串，根据key改value java json postman
2022-01-17 11:07

回答 2 已采纳自己记录一下吧，不知哪位大神有简单的方式。 ```java //文件读取字符串 jsonStr = sb.toString(); //转jsonOb
机器学习数学语言（7.28作业）
2021-07-28 22:29

波比波的博客 1.累加累乘与积分 1.将向量下标为偶数的分量 (x2, x4, …) 累加, 写出相应表达式. ∑i=2nn∈Nxi\sum_{i=2n}^{n\in \mathbb{N}} x_ii=2n∑n∈Nxi 2.各出一道累加、累乘、积分表达式的习题, 并给出标准答案. ①将...
python读matlab.fig_python可视化：matplotlib学习笔记
2020-12-10 15:23

weixin_39725193的博客 Python有许多的扩展库可以进行静态或者动态的可视化，但是在这一章里，书的作者只focus on在matplotlib以及建立在之上的库。matplotlib是一个桌面绘图包，用于绘制(主要是二维的)发表用的图。该项...
如何从头训练大语言模型: A simple technical report
2024-10-17 11:36

再不会AI就不礼貌了的博客：搞定全流程之后，对LLM确实豁然开朗不少，不过，发现要学的新东西更多了…尤其是这三个月，qwen, meta, anthropic等等发布的好文章实在太多了，真不想落下，没时间"反刍"当年的剩饭。：对reasoning更感兴趣了(其实...
已解决DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Pyt
2023-02-15 21:26

袁袁袁袁满的博客已解决DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January ...
机器学习周报第五周（7.22-7.28）
2024-07-28 19:12

Nil_cxc的博客对于一层卷积层，我们有很多个Filter，可以认为每一个Filter都是为了侦测某一个特征，其大小为设为( 3 × 3 × channel) ，而filter中的参数就是我们要通过学习得到的。假设我们已知参数，计算的过程称为Convolution...
【机器学习】机器学习实验方法与原则（评价指标全面解析）
2024-03-18 08:52

X.AI666的博客在本篇文章中，我们深入探讨了机器学习与数据科学中最重要的评价指标，覆盖从基本的回归与分类任务到更特定的应用场景。对于回归任务，我们详细介绍了如何通过平均绝对误差（MAE）、均方误差（MSE）以及均方根误差...
Python与人工智能——24、for循环基础练习题——判断质数/素数
2024-09-26 23:08

红目香薰的博客质数，可以算是一个大题，不仅仅是我们练习中会使用到，各种算法的比赛中也会运用到的，希望大家能用心把这里搞一下，包括后面的质数，因数等操作，都是非常重要的内容。
C 语言编程常见问题解答.chm
2012-09-24 12:35

C 语言编程常见问题解答【作者】[美]Paul S.R. Chisholm 译:张芳妮吕波【出版社】清华大学出版社 C语言编程常见问题解答(目录) 第l章 C语言 1. 1 什么是局部程序块(local block)? 1. 2 可以把变量保存...
7.28新版智慧城市建设解决方案.pptx
2024-09-05 08:46

7.28新版智慧城市建设解决方案
python入门--函数 7.28笔记
2022-07-28 17:38

YL0425___的博客 python入门--函数 7.28笔记
Python暑期学习7.28
2022-07-28 11:13

Minghao00的博客 Python暑期学习7.28
机器学习方法与原则
2023-07-23 21:18

lov_vol的博客 机器学习方法与原则评价指标 TODO 训练集、验证集与测试集训练集与测试集训练集（作业）：模型可见样本标签，用于训练模型，样本数量有限。在训练集上表现好的模型，在其它未见样本上一定表现好么？小心过...
opencv_python-3.4.7.28-cp37-cp37m-win_amd64.whl百度云链接
2020-03-27 14:02

opencv_python-3.4.7.28-cp37-cp37m-win_amd64百度云链接，适用于win10、python3.7版本
7.28 Python 文件I/O
2022-07-31 00:01

yangshiting84的博客 Python文件属性，读写函数
opencv_python-3.4.7.28-cp37-cp37m-win_amd64.whl
2020-04-26 20:01

机器学习-计算机视觉的库 opencv_python-3.4.7.28-cp37-cp37m-win_amd64.whl
没有解决我的问题, 去提问

悬赏问题

¥15 脱敏项目合作，ner需求合作
¥30 Matlab打开默认名称带有/的光谱数据
¥50 easyExcel模板动态单元格合并列
¥15 res.rows如何取值使用
¥15 在odoo17开发环境中，怎么实现库存管理系统，或独立模块设计与AGV小车对接？开发方面应如何设计和开发？请详细解释MES或WMS在与AGV小车对接时需完成的设计和开发
¥15 CSP算法实现EEG特征提取，哪一步错了？
¥15 游戏盾如何溯源服务器真实ip?需要30个字。后面的字是凑数的
¥15 vue3前端取消收藏的不会引用collectId
¥15 delphi7 HMAC_SHA256方式加密
¥15 关于#qt#的问题：我想实现qcustomplot完成坐标轴

7.28（周日）中午之前要：如何使用Matlab或python或其他语言解决机器学习中KNN与GMM的问题？

3条回答 默认 最新

悬赏问题

3条回答默认最新