This is simply because goroutines are not threads. A given goroutine can be scheduled by the Go runtime to be associated to an Operating System thread, but for example in the case of blocking I/O operations, said threads can be associated to other goroutines while the other one is waiting.
What does this mean?
Joining needs a synchronization object in order to know when a thread is finished. As Go's goroutines are actually just very lightweight objects that only possess a stack, they do not provide such synchronization objects directly.
Go's CSP premise is that you can instanciate thousands of goroutines very cheaply, and only use as many threads as you have physical CPU cores.
In the perspective of the OS, synchronization objects are expensive, therefore having such objects for each goroutine would be very inefficient.
Instead, synchronization is attained by using channels or WaitGroup's from the sync package.