如何在 r 中绘制两个直方图?

我正在使用R,并且有两个数据框:胡萝卜和黄瓜。 每个数据框都有一个数字列,其中列出了所有测得的胡萝卜(总计:100k胡萝卜)和黄瓜(总计:50k黄瓜)的长度。 我希望在同一图上绘制两个直方图-胡萝卜长度和黄瓜长度。 它们是重叠的,所以我想我也需要一些透明度。 我还需要使用相对频率而不是绝对数字,因为每个组中的实例数量不同。 像这样的东西会很好,但是我不明白该如何从我的两个表中创建它:

overlapped density

转载于:https://stackoverflow.com/questions/3541713/how-to-plot-two-histograms-together-in-r

csdnceshi51
旧行李 someone posted some code snippet to do it in this thread: stackoverflow.com/questions/3485456/…
大约 10 年之前 回复
csdnceshi72
谁还没个明天 I'm using R as the tag suggests (edited post to make this clear)
大约 10 年之前 回复
weixin_41568110
七度&光 Btw, which software are you planning to use? For open source, I'd recommend gnuplot.info [gnuplot]. In its documentation, I believe you will find certain technique and sample scripts to do what you want.
大约 10 年之前 回复

8个回答

That image you linked to was for density curves, not histograms.

If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.

So, let's start with something like what you have, two separate sets of data and combine them.

carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))

#Now, combine your two dataframes into one.  First make a new column in each that will be a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'

#and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)

After that, which is unnecessary if your data is in long formal already, you only need one line to make your plot.

ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)

enter image description here

Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.

ggplot(vegLengths, aes(length, fill = veg)) + 
   geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')

enter image description here

csdnceshi55
~Onlooker Thx for the answer! The 'position="identity"' part is actually important as otherwise the bars are stacked which is misleading when combined with a density that by default seems to be "identity", i.e., overlayed as opposed to stacked.
5 年多之前 回复
csdnceshi65
larry*wei If you'd like to stay with histograms, use ggplot(vegLengths, aes(length, fill = veg)) + geom_bar(pos="dodge"). This will make interlaced histograms, like in MATLAB.
大约 10 年之前 回复

Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):

set.seed(42)
p1 <- hist(rnorm(500,4))                     # centered at 4
p2 <- hist(rnorm(500,6))                     # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10))  # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T)  # second

The key is that the colours are semi-transparent.

Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:

enter image description here

csdnceshi66
必承其重 | 欲带皇冠 Put the options in the plot command directly into the hist command as I said. Posting the code isn't what comments are for.
大约 5 年之前 回复
weixin_41568184
叼花硬汉 How would you do it?
大约 5 年之前 回复
csdnceshi66
必承其重 | 欲带皇冠 Why did you separate out the plot commands? You can put all of those options into the hist commands and just two it in the two lines.
6 年多之前 回复
csdnceshi70
笑故挽风 +1 thank you all, can this be converted to a smoother gistogram (like had.co.nz/ggplot2/graphics/55078149a733dd1a0b42a57faf847036.png)?
大约 10 年之前 回复

Here is an example of how you can do it in "classic" R graphics:

## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
              histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
     col = rgb(1,0,0,0.4),xlab = 'Lengths',
     freq = FALSE, ## relative, not absolute frequency
     main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
     xaxt = 'n', yaxt = 'n', ## don't add axes
     col = rgb(0,0,1,0.4), add = TRUE,
     freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
       fill = rgb(1:0,0,0:1,0.4), bty = 'n',
       border = NA)
par(opar)

The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to hist).

csdnceshi61
derek5. I like this method, note that you can synchronize breaks by defining them with seq(). For example: breaks=seq(min(data$some_property), max(data$some_property), by=(max_prop - min_prop)/20)
大约 4 年之前 回复
csdnceshi62
csdnceshi62 Upping this because this answer is the only one (besides those in ggplot) which directly accounts for if your two histograms have substantially different sample sizes.
接近 6 年之前 回复
weixin_41568134
MAO-EYE To @gd047: thanks for the link - it looks useful. To @John, thanks for pointing out the bug - I think the "o" was lost in the copy-paste.
大约 10 年之前 回复
csdnceshi66
必承其重 | 欲带皇冠 hey... I usually am the one who puts up the base R version! :) You forgot to assign opar.
大约 10 年之前 回复
csdnceshi54
hurriedly% Very nice. It also reminded me of that one stackoverflow.com/questions/3485456/…
大约 10 年之前 回复

Here's the version like the ggplot2 one I gave only in base R. I copied some from @nullglob.

generate the data

carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)

You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.

## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
     main = 'Distribution of carrots and cucumbers', 
     panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
       fill = c(carrotCol, cukeCol), bty = 'n',
       border = NA)

enter image description here

Here's a function I wrote that uses pseudo-transparency to represent overlapping histograms

plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
                                breaks=NULL, xlim=NULL, ylim=NULL){

  ahist=NULL
  bhist=NULL

  if(!(is.null(breaks))){
    ahist=hist(a,breaks=breaks,plot=F)
    bhist=hist(b,breaks=breaks,plot=F)
  } else {
    ahist=hist(a,plot=F)
    bhist=hist(b,plot=F)

    dist = ahist$breaks[2]-ahist$breaks[1]
    breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)

    ahist=hist(a,breaks=breaks,plot=F)
    bhist=hist(b,breaks=breaks,plot=F)
  }

  if(is.null(xlim)){
    xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
  }

  if(is.null(ylim)){
    ylim = c(0,max(ahist$counts,bhist$counts))
  }

  overlap = ahist
  for(i in 1:length(overlap$counts)){
    if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
      overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
    } else {
      overlap$counts[i] = 0
    }
  }

  plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
  plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
  plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}

Here's another way to do it using R's support for transparent colors

a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )

The results end up looking something like this: alt text

weixin_41568134
MAO-EYE +1 for an option available on all graphics devices (e.g. postscript)
接近 7 年之前 回复

@Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]

The hist() function by default draws plots, so you need to add the plot=FALSE option. Moreover, it is clearer to establish the plot area by a plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:

set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)

And here is the result (a bit too wide because of RStudio :-) ):

enter image description here

csdnceshi65
larry*wei upping this because it is a very simple option using base and viable on postscript devices.
接近 6 年之前 回复

Plotly's R API might be useful for you. The graph below is here.

library(plotly)
#add username and key
p <- plotly(username="Username", key="API_KEY")
#generate data
x0 = rnorm(500)
x1 = rnorm(500)+1
#arrange your graph
data0 = list(x=x0,
         name = "Carrots",
         type='histogramx',
         opacity = 0.8)

data1 = list(x=x1,
         name = "Cukes",
         type='histogramx',
         opacity = 0.8)
#specify type as 'overlay'
layout <- list(barmode='overlay',
               plot_bgcolor = 'rgba(249,249,251,.85)')  
#format response, and use 'browseURL' to open graph tab in your browser.
response = p$plotly(data0, data1, kwargs=list(layout=layout))

url = response$url
filename = response$filename

browseURL(response$url)

Full disclosure: I'm on the team.

Graph

Already beautiful answers are there, but I thought of adding this. Looks good to me. (Copied random numbers from @Dirk). library(scales) is needed`

set.seed(42)
hist(rnorm(500,4),xlim=c(0,10),col='skyblue',border=F)
hist(rnorm(500,6),add=T,col=scales::alpha('red',.5),border=F)

The result is...

enter image description here

Update: This overlapping function may also be useful to some.

hist0 <- function(...,col='skyblue',border=T) hist(...,col=col,border=border) 

I feel result from hist0 is prettier to look than hist

hist2 <- function(var1, var2,name1='',name2='',
              breaks = min(max(length(var1), length(var2)),20), 
              main0 = "", alpha0 = 0.5,grey=0,border=F,...) {    

library(scales)
  colh <- c(rgb(0, 1, 0, alpha0), rgb(1, 0, 0, alpha0))
  if(grey) colh <- c(alpha(grey(0.1,alpha0)), alpha(grey(0.9,alpha0)))

  max0 = max(var1, var2)
  min0 = min(var1, var2)

  den1_max <- hist(var1, breaks = breaks, plot = F)$density %>% max
  den2_max <- hist(var2, breaks = breaks, plot = F)$density %>% max
  den_max <- max(den2_max, den1_max)*1.2
  var1 %>% hist0(xlim = c(min0 , max0) , breaks = breaks,
                 freq = F, col = colh[1], ylim = c(0, den_max), main = main0,border=border,...)
  var2 %>% hist0(xlim = c(min0 , max0),  breaks = breaks,
                 freq = F, col = colh[2], ylim = c(0, den_max), add = T,border=border,...)
  legend(min0,den_max, legend = c(
    ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
    ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
    "Overlap"), fill = c('white','white', colh[1]), bty = "n", cex=1,ncol=3)

  legend(min0,den_max, legend = c(
    ifelse(nchar(name1)==0,substitute(var1) %>% deparse,name1),
    ifelse(nchar(name2)==0,substitute(var2) %>% deparse,name2),
    "Overlap"), fill = c(colh, colh[2]), bty = "n", cex=1,ncol=3) }

The result of

par(mar=c(3, 4, 3, 2) + 0.1) 
set.seed(100) 
hist2(rnorm(10000,2),rnorm(10000,3),breaks = 50)

is

enter image description here

立即提问
相关内容推荐