In order to add millions of records to a Postgres database with constant memory consumption, I am using a thread pool with several workers as well as a gorp.Transaction
.
Per million records, the following code is called from different threads about a hundred times, each time handling a batch of 10000 records or so:
func batchCopy(p importParams) error {
copy := pq.CopyIn("entity", "startdate", "value", "expirydate", "accountid")
stmt, err := p.txn.Prepare(copy)
if err != nil {
return err
}
for _, r := range p.records {
_, err := stmt.Exec(
r.startDate,
r.value,
r.expiryDate,
p.accountId)
if err != nil {
return err
}
}
if err := stmt.Close(); err != nil {
return err
}
return nil
}
For some reason, I've noticed that the process tends to be very slow with the Prepare
and Close
calls taking very long.
I then tried to reuse the same statement for all of my calls to batchCopy
and close it after all of them are done. In this case, the batchCopy
finishes very fast, but when I call stmt.Close()
- it takes forever.
Questions:
- What would be the right way to go about statements? Should I create one per batch or reuse them?
- What is happening in
stmt.Close()
and why does it take so long after many calls tostmt.Exec
?