垃圾回收和Go中指针的正确用法

I come from a Python/Ruby/JavaScript background. I understand how pointers work, however, I'm not completely sure how to leverage them in the following situation.

Let's pretend we have a fictitious web API that searches some image database and returns a JSON describing what's displayed in each image that was found:

[
    {
        "url": "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg",
        "description": "Ocean islands",
        "tags": [
            {"name":"ocean", "rank":1},
            {"name":"water", "rank":2},
            {"name":"blue", "rank":3},
            {"name":"forest", "rank":4}
        ]
    },

    ...

    {
        "url": "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg",
        "description": "Bridge over river",
        "tags": [
            {"name":"bridge", "rank":1},
            {"name":"river", "rank":2},
            {"name":"water", "rank":3},
            {"name":"forest", "rank":4}
        ]
    }
]

My goal is to create a data structure in Go that will map each tag to a list of image URLs that would look like this:

{
    "ocean": [
        "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg"
    ],
    "water": [
        "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg",
        "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
    ],
    "blue": [
        "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg"
    ],
    "forest":[
        "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg", 
        "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
    ],
    "bridge": [
        "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
    ],
    "river":[
        "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg"
    ]
}

As you can see, each image URL can belong to multiple tags at the same time. If I have thousands of images and even more tags, this data structure can grow very large if image URL strings are copied by value for each tag. This is where I want to leverage pointers.

I can represent the JSON API response by two structs in Go, func searchImages() mimics the fake API:

package main

import "fmt"


type Image struct {
    URL string
    Description string
    Tags []*Tag
}

type Tag struct {
    Name string
    Rank int
}

// this function mimics json.NewDecoder(resp.Body).Decode(&parsedJSON)
func searchImages() []*Image {
    parsedJSON := []*Image{
        &Image {
            URL: "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg",
            Description: "Ocean islands",
            Tags: []*Tag{
                &Tag{"ocean", 1},
                &Tag{"water", 2},
                &Tag{"blue", 3},
                &Tag{"forest", 4},
            }, 
        },
        &Image {
            URL: "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg",
            Description: "Bridge over river",
            Tags: []*Tag{
                &Tag{"bridge", 1},
                &Tag{"river", 2},
                &Tag{"water", 3},
                &Tag{"forest", 4},
            }, 
        },
    }
    return parsedJSON
}

Now the less optimal mapping function that results in a very large in-memory data structure can look like this:

func main() {
    result := searchImages()

    tagToUrlMap := make(map[string][]string)

    for _, image := range result {
        for _, tag := range image.Tags {
            // fmt.Println(image.URL, tag.Name)
            tagToUrlMap[tag.Name] = append(tagToUrlMap[tag.Name], image.URL)
        }
    }

    fmt.Println(tagToUrlMap)
}

I can modify it to use pointers to the Image struct URL field instead of copying it by value:

    // Version 1

    tagToUrlMap := make(map[string][]*string)

    for _, image := range result {
        for _, tag := range image.Tags {
            // fmt.Println(image.URL, tag.Name)
            tagToUrlMap[tag.Name] = append(tagToUrlMap[tag.Name], &image.URL)
        }
    }

It works and my first question is what happens to the result data structure after I build the mapping in this way? Will the Image URL string fields be left in memory somehow and the rest of the result will be garbage collected? Or will the result data structure stay in memory until the end of the program because something points to its members?

Another way to do this would be to copy the URL to an intermediate variable and use a pointer to it instead:

    // Version 2

    tagToUrlMap := make(map[string][]*string)

    for _, image := range result {
        imageUrl = image.URL
        for _, tag := range image.Tags {
            // fmt.Println(image.URL, tag.Name)    
            tagToUrlMap[tag.Name] = append(tagToUrlMap[tag.Name], &imageUrl)
        }
    }

Is this better? Will the result data structure be garbage collected correctly?

Or perhaps I should use a pointer to string in the Image struct instead?

type Image struct {
    URL *string
    Description string
    Tags []*Tag
}

Is there a better way to do this? I would also appreciate any resources on Go that describe various uses of pointers in depth. Thanks!

https://play.golang.org/p/VcKWUYLIpH7

UPDATE: I'm worried about optimal memory consumption and not generating unwanted garbage the most. My goal is to use the minimal amount of memory possible.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
ds2010630 2018-08-23 10:08
关注
First some background. string values in Go are represented by a small struct-like data structure reflect.StringHeader:

type StringHeader struct { Data uintptr Len int }

So basically passing / copying a string value passes / copies this small struct value, which is 2 words only regardless of the length of the string. On 64-bit architectures, it's only 16 bytes, even if the string has a thousand characters.

So basically string values already act as pointers. Introducing another pointer like *string just complicates usage, and you won't really gain any noticable memory. For the sake of memory optimization, forget about using *string.

It works and my first question is what happens to the result data structure after I build the mapping in this way? Will the Image URL string fields be left in memory somehow and the rest of the result will be garbage collected? Or will the result data structure stay in memory until the end of the program because something points to its members?

If you have a pointer value pointing to a field of a struct value, then the whole struct will be kept in memory, it can't be garbage collected. Note that although it could be possible to release memory reserved for other fields of the struct, but the current Go runtime and garbage collector does not do so. So to achieve optimal memory usage, you should forget about storing addresses of struct fields (unless you also need the complete struct values, but still, storing field addresses and slice/array element addresses always requires care).

^{The reason for this is because memory for struct values are allocated as a contiguous segment, and so keeping only a single referenced field would strongly fragment the available / free memory, and would make optimal memory management even harder and less efficient. Defragmenting such areas would also require copying the referenced field's memory area, which would require "live-changing" pointer values (changing memory addresses).}

So while using pointers to string values may save you some tiny memory, the added complexity and additional indirections make it unworthy.

So what to do then?

"Optimal" solution

So the cleanest way is to keep using string values.

And there is one more optimization we didn't talk about earlier.

You get your results by unmarshaling a JSON API response. This means that if the same URL or tag value is included multiple times in the JSON response, different string values will be created for them.

What does this mean? If you have the same URL twice in the JSON response, after unmarshaling, you will have 2 distinct string values which will contain 2 different pointers pointing to 2 different allocated byte sequences (string content which otherwise will be the same). The encoding/json package does not do string interning.

Here's a little app that proves this:

var s []string err := json.Unmarshal([]byte(`["abc", "abc", "abc"]`), &s) if err != nil { panic(err) } for i := range s { hdr := (*reflect.StringHeader)(unsafe.Pointer(&s[i])) fmt.Println(hdr.Data) }

Output of the above (try it on the Go Playground):

273760312 273760315 273760320

We see 3 different pointers. They could be the same, as string values are immutable.

The json package does not detect repeating string values because the detection adds memory and computational overhead, which is obviously something unwanted. But in our case we shoot for optimal memory usage, so an "initial", additional computation does worth the big memory gain.

So let's do our own string interning. How to do that?

After unmarshaling the JSON result, during building the tagToUrlMap map, let's keep track of string values we have come across, and if the subsequent string value has been seen earlier, just use that earlier value (its string descriptor).

Here's a very simple string interner implementation:

var cache = map[string]string{} func interned(s string) string { if s2, ok := cache[s]; ok { return s2 } // New string, store it cache[s] = s return s }

Let's test this "interner" in the example code above:

var s []string err := json.Unmarshal([]byte(`["abc", "abc", "abc"]`), &s) if err != nil { panic(err) } for i := range s { hdr := (*reflect.StringHeader)(unsafe.Pointer(&s[i])) fmt.Println(hdr.Data, s[i]) } for i := range s { s[i] = interned(s[i]) } for i := range s { hdr := (*reflect.StringHeader)(unsafe.Pointer(&s[i])) fmt.Println(hdr.Data, s[i]) }

Output of the above (try it on the Go Playground):

273760312 abc 273760315 abc 273760320 abc 273760312 abc 273760312 abc 273760312 abc

Wonderful! As we can see, after using our interned() function, only a single instance of the "abc" string is used in our data structure (which is actually the first occurrence). This means all other instances (given no one else uses them) can be–and will be–properly garbage collected (by the garbage collector, some time in the future).

One thing to not forget here: the string interner uses a cache dictionary which stores all previously encountered string values. So to let those strings go, you should "clear" this cache map too, simplest done by assigning a nil value to it.

Without further ado, let's see our solution:

result := searchImages() tagToUrlMap := make(map[string][]string) for _, image := range result { imageURL := interned(image.URL) for _, tag := range image.Tags { tagName := interned(tag.Name) tagToUrlMap[tagName] = append(tagToUrlMap[tagName], imageURL) } } // Clear the interner cache: cache = nil

To verify the results:

enc := json.NewEncoder(os.Stdout) enc.SetIndent("", " ") if err := enc.Encode(tagToUrlMap); err != nil { panic(err) }

Output is (try it on the Go Playground):

{ "blue": [ "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg" ], "bridge": [ "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg" ], "forest": [ "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg", "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg" ], "ocean": [ "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg" ], "river": [ "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg" ], "water": [ "https://c8.staticflickr.com/4/3707/11603200203_87810ddb43_o.jpg", "https://c3.staticflickr.com/1/48/164626048_edeca27ed7_o.jpg" ] }

Further memory optimizations:

We used the builtin append() function to add new image URLs to tags. append() may (and usually does) allocate bigger slices than needed (thinking of future growth). After our "build" process, we may go through our tagToUrlMap map and "trim" those slices to the minimum needed.

This is how it could be done:

for tagName, urls := range tagToUrlMap { if cap(urls) > len(urls) { urls2 := make([]string, len(urls)) copy(urls2, urls) tagToUrlMap[tagName] = urls2 } }
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

垃圾回收和Go中指针的正确用法
2018-08-22 04:14

回答 2 已采纳 First some background. string values in Go are represented by a small struct-like data structure r
关于地址和指针的叙述中正确的是 c++
2021-05-21 10:44

回答 1 已采纳 C不对的。编译器会报类型错误。比如： char* p1 = new char[256]; char** p2 = &p1; //正确 char* p3 = &p1; // 编译错误，类型不匹配
指针是否由于垃圾回收而改变了Go中的值？
2017-02-27 00:04

回答 1 已采纳 These rules for passing pointer between Go code and C code are dynamically checked at runtime. So
go语言指针和C指针区别,Go语言指针详解，看这一篇文章就够了
2021-05-21 13:13

LA05hiren的博客不像Java和.NET，Go语言为程序员提供了控制数据结构的指针的能力；但是，并不能进行指针运算。通过给予程序员基本内存布局，Go语言允许你控制特定集合的数据结构、分配的数量以及内存访问模式，这些对构建运行良好的...
如何在golang中正确打印出指针变量
2014-04-17 15:50

回答 2 已采纳 tom is a pointer to a person. When you use &tom, you're a creating a second pointer, this a pointe
Go Protobuf声明和Go Struct中的可选字段（字符串指针）
2019-07-15 11:43

回答 1 已采纳 Proto3 returns the Zero Value even if a field isn't set. Currently there is no way to distinguish
如何理解go语言的值接收者和指针接收者 golang 有问必答
2021-05-20 09:14

回答 2 已采纳参考一下：https://blog.csdn.net/qq_31930499/article/details/93335096
go中的值方法和指针方法
2023-02-18 16:34

十假杰出青年的博客 golang中在给结构体对象添加方法时，接收者参数类型可以有两种，一种时值类型，另一种是指针类型。
Go是否按值或作为指针将对象存储在map中？
2019-07-09 12:23

回答 1 已采纳 structs are stored by value. Accesses to conns["127.0.0.1"] will give you a copy of the Conn struc
如何在GO中获取指向接口的指针
2019-02-13 12:23

回答 2 已采纳 Let me know if this is what you are looking for: https://play.golang.org/p/ZGRyIqN7bPR Full code:
在Golang中返回文件指针
2016-03-20 03:22

回答 1 已采纳 The Go Programming Language Specification Defer statements A "defer" statement invoke
Go 语言中的垃圾回收机制 GC 详解
2022-05-05 13:42

沉淅尘的博客 Go 语言中的垃圾回收机制 GC 详解
怎样对结构体指针中的指针指向的变量进行赋值和输出 c语言数据结构
2022-04-23 20:31

回答 3 已采纳你的B没有指向任何具体的Arr结构体，应该先有一个Arr的结构体实例，然后让B指向它 #include <stdio.h> typedef struct { int* p; }Ar
go java 垃圾回收_Golang 垃圾回收
2021-02-27 13:22

weixin_39807352的博客 Golang 垃圾回收什么是根对象(roots)[1]：根对象在垃圾回收的术语中又叫做根集合，它是垃圾回收器在标记过程时最先检查的对象，包括：全局变量：程序在编译期就能确定的那些存在于程序整个生命周期的变量。...
Golang 指针：使用方法、特点和运算
2023-01-10 20:00

程序员小强_的博客在 Golang 中，虽然不是所有的值都可以取出地址（尽管他们也存储在内存中，），但所有的变量必然可以取出地址。变量，就是一块内存区域中存储的值。不仅我们熟悉的var x int中的x是变量，一个比较复杂的表达式也可以...
没有解决我的问题, 去提问

悬赏问题

¥15 关于#matlab#的问题：在模糊控制器中选出线路信息，在simulink中根据线路信息生成速度时间目标曲线（初速度为20m/s，15秒后减为0的速度时间图像）我想问线路信息是什么
¥15 banner广告展示设置多少时间不怎么会消耗用户价值
¥16 mybatis的代理对象无法通过@Autowired装填
¥15 可见光定位matlab仿真
¥15 arduino 四自由度机械臂
¥15 wordpress 产品图片 GIF 没法显示
¥15 求三国群英传pl国战时间的修改方法
¥15 matlab代码代写，需写出详细代码，代价私
¥15 ROS系统搭建请教（跨境电商用途）
¥15 AIC3204的示例代码有吗，想用AIC3204测量血氧，找不到相关的代码。

垃圾回收和Go中指针的正确用法

2条回答 默认 最新

"Optimal" solution

Further memory optimizations:

悬赏问题

2条回答默认最新