乱世＠小熊 2012-04-16 04:23 采纳率: 25%

已采纳

图像处理: 可口可乐罐识别算法的改进

One of the most interesting projects I've worked on in the past couple of years was a project about image processing. The goal was to develop a system to be able to recognize Coca-Cola 'cans' (note that I'm stressing the word 'cans', you'll see why in a minute). You can see a sample below, with the can recognized in the green rectangle with scale and rotation.

Template matching

Some constraints on the project:

The background could be very noisy.
The can could have any scale or rotation or even orientation (within reasonable limits).
The image could have some degree of fuzziness (contours might not be entirely straight).
There could be Coca-Cola bottles in the image, and the algorithm should only detect the can!
The brightness of the image could vary a lot (so you can't rely "too much" on color detection).
The can could be partly hidden on the sides or the middle and possibly partly hidden behind a bottle.
There could be no can at all in the image, in which case you had to find nothing and write a message saying so.

So you could end up with tricky things like this (which in this case had my algorithm totally fail):

Total fail

I did this project a while ago, and had a lot of fun doing it, and I had a decent implementation. Here are some details about my implementation:

Language: Done in C++ using OpenCV library.

Pre-processing: For the image pre-processing, i.e. transforming the image into a more raw form to give to the algorithm, I used 2 methods:

Changing color domain from RGB to HSV and filtering based on "red" hue, saturation above a certain threshold to avoid orange-like colors, and filtering of low value to avoid dark tones. The end result was a binary black and white image, where all white pixels would represent the pixels that match this threshold. Obviously there is still a lot of crap in the image, but this reduces the number of dimensions you have to work with.
Noise filtering using median filtering (taking the median pixel value of all neighbors and replace the pixel by this value) to reduce noise.
Using Canny Edge Detection Filter to get the contours of all items after 2 precedent steps.

Algorithm: The algorithm itself I chose for this task was taken from this awesome book on feature extraction and called Generalized Hough Transform (pretty different from the regular Hough Transform). It basically says a few things:

You can describe an object in space without knowing its analytical equation (which is the case here).
It is resistant to image deformations such as scaling and rotation, as it will basically test your image for every combination of scale factor and rotation factor.
It uses a base model (a template) that the algorithm will "learn".
Each pixel remaining in the contour image will vote for another pixel which will supposedly be the center (in terms of gravity) of your object, based on what it learned from the model.

In the end, you end up with a heat map of the votes, for example here all the pixels of the contour of the can will vote for its gravitational center, so you'll have a lot of votes in the same pixel corresponding to the center, and will see a peak in the heat map as below:

GHT

Once you have that, a simple threshold-based heuristic can give you the location of the center pixel, from which you can derive the scale and rotation and then plot your little rectangle around it (final scale and rotation factor will obviously be relative to your original template). In theory at least...

Results: Now, while this approach worked in the basic cases, it was severely lacking in some areas:

It is extremely slow! I'm not stressing this enough. Almost a full day was needed to process the 30 test images, obviously because I had a very high scaling factor for rotation and translation, since some of the cans were very small.
It was completely lost when bottles were in the image, and for some reason almost always found the bottle instead of the can (perhaps because bottles were bigger, thus had more pixels, thus more votes)
Fuzzy images were also no good, since the votes ended up in pixel at random locations around the center, thus ending with a very noisy heat map.
In-variance in translation and rotation was achieved, but not in orientation, meaning that a can that was not directly facing the camera objective wasn't recognized.

Can you help me improve my specific algorithm, using exclusively OpenCV features, to resolve the four specific issues mentioned?

I hope some people will also learn something out of it as well, after all I think not only people who ask questions should learn. :)

转载于:https://stackoverflow.com/questions/10168686/image-processing-algorithm-improvement-for-coca-cola-can-recognition

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

22条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
三生石@ 2012-04-16 05:17
关注
An alternative approach would be to extract features (keypoints) using the scale-invariant feature transform (SIFT) or Speeded Up Robust Features (SURF).

It is implemented in OpenCV 2.3.1.

You can find a nice code example using features in Features2D + Homography to find a known object

Both algorithms are invariant to scaling and rotation. Since they work with features, you can also handle occlusion (as long as enough keypoints are visible).

Image source: tutorial example

The processing takes a few hundred ms for SIFT, SURF is bit faster, but it not suitable for real-time applications. ORB uses FAST which is weaker regarding rotation invariance.

The original papers

SURF: Speeded Up Robust Features

Distinctive Image Features from Scale-Invariant Keypoints

ORB: an efficient alternative to SIFT or SURF
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(21条)

报告相同问题？

关注问题

图像处理: 可口可乐罐识别算法的改进 c++ opencv
2012-04-16 04:23

回答 23 已采纳 An alternative approach would be to extract features (keypoints) using the scale-invariant feature
什么东西会进入你的。如果你使用的是可口可乐？ cocoa git objective-c
2012-02-25 18:12

回答 17 已采纳 Personally I do not check in the Pods directory & contents. I can't say I spent long ages consider
关于#echarts#词云图的问题，如何解决？ echarts spring boot 前端
2023-01-20 21:07

回答 2 已采纳确认 echarts.js 和 wordcloud.js 文件的版本是否正确。确认 echarts.js 和 wordcloud.js 文件是否能够正确的加载。你可以在浏览器的开发者工具中检查文件是
图像处理：“可口可乐”识别的算法改进
2020-03-30 16:59

w36680130的博客 :) :) #1楼参考：https://stackoom.com/question/gfL4/图像处理-可口可乐-识别的算法改进 #2楼 If you are not limited to just a camera which wasn't in one of your constraints perhaps you can move to using...
关于死循环不输出问题，如何解决？ c语言
2022-11-10 23:31

回答 2 已采纳先试试把这两个改一改？kudan不是char数组嘛，然后后面那个应该是kudan1，明天上电脑俺再看看
西门子PLC工程实例源码第456期：可口可乐灌装车间做的恒压供水程序.rar
2022-05-01 21:56

资源名：西门子PLC工程实例源码第456期：可口可乐灌装车间做的恒压供水程序.rar 资源类型：西门子PLC工程实例源码源码说明：全部项目源码都是经过测试校正后百分百成功运行的，如果您下载后不能运行可联系我进行...
可口可乐新slogan：如何打死自律小人
2021-02-26 04:27

前段时间，可口可乐换了新的slogan：“Tastethefeeling”（品味这样的感觉）。这件事几乎震动了整个营销界，可口可乐的首席营销官说：“我们的目的是为了强调产品。”而李叫兽看了新的slogan后，第一反应是：大公司...
卖场品牌：可口可乐中国营销战
2020-12-21 15:04

这一款整理发布的卖场品牌：可口可乐中国营销战，适合超市管理人员学习参考超市管理分类中的卖...该文档为卖场品牌：可口可乐中国营销战，是一份很不错的参考资料，具有较高参考价值，感兴趣的可以下载看看
App-Promocion:可口可乐应用
2021-05-22 08:57

应用推广 可口可乐App
cokepokes.github.io:可口可乐的Beta回购
2021-05-01 06:47

cokepokes.github.io 可口可乐的Beta回购如何安装回购密钥：在终端中运行：apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C7584C0CBDEA84C1
消费行业日报：可口可乐将全资收购Costa咖啡.pdf
2021-07-16 11:30

消费行业日报：可口可乐将全资收购Costa咖啡.pdf
coca-colaLanding:可口可乐着陆页研究所-网络研究所
2021-03-29 11:38

coca-colaLanding:可口可乐着陆页研究所-网络研究所
可口可乐CIS分析PPT课件.ppt
2021-05-18 15:13

可口可乐CIS分析，可口可乐CIS分析课件，可口可乐CIS分析PPT
可口可乐罐电脑图标下载
2020-12-24 14:00

……………………该文档为可口可乐罐电脑图标下载，是一份很不错的参考资料，具有较高参考价值，感兴趣的可以下载看看
可口可乐：Pagina de可口可乐
2021-02-10 11:18

可口可乐 Pagina de可口可乐
可口可乐：Es un ejemplo实践HTML + CSS
2021-02-21 08:56

可口可乐 Es un ejemplo实践HTML + CSS
Practice_Set_2：百事可乐VS可口可乐
2021-02-16 13:16

Practice_Set_2：百事可乐VS可口可乐
本人为可口可乐灌装车间做的恒压供水程序.zip西门子PLC编程实例程序源码下载
2022-04-20 03:50

本人为可口可乐灌装车间做的恒压供水程序.zip西门子PLC编程实例程序源码下载本人为可口可乐灌装车间做的恒压供水程序.zip西门子PLC编程实例程序源码下载本人为可口可乐灌装车间做的恒压供水程序.zip西门子PLC编程...
SO_COKE:珠海可口可乐销售返利项目
2021-05-30 00:33

珠海可口可乐销售返利项目
没有解决我的问题, 去提问

悬赏问题

¥15 用visual studi code完成html页面
¥15 聚类分析或者python进行数据分析
¥15 逻辑谓词和消解原理的运用
¥15 三菱伺服电机按启动按钮有使能但不动作
¥15 js，页面2返回页面1时定位进入的设备
¥50 导入文件到网吧的电脑并且在重启之后不会被恢复
¥15 （希望可以解决问题）ma和mb文件无法正常打开，打开后是空白，但是有正常内存占用，但可以在打开Maya应用程序后打开场景ma和mb格式。
¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝
¥20 腾讯企业邮箱邮件可以恢复么
¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗？

图像处理: 可口可乐罐识别算法的改进

22条回答 默认 最新

The original papers

悬赏问题

22条回答默认最新