duanjiu1003 2017-04-29 14:05
浏览 80
已采纳

MySQL到JSON不一致提取

I have a MySQL database with 6 tables and about 2 million rows all together.

I want to migrate all the data into MongoDB.

I decided to do this by converting the SQL tables into JSON and importing it to MongoDB.

I wrote a program in Golang to extract the data and output it as JSON.

This is the main function of the program:

func main() {
    // Open a database connection
    var err error
    db, err = sql.Open("mysql", "root:password@tcp(127.0.0.1:3306)/employees")
    checkErr(err)
    // Check if reachable
    if err = db.Ping(); err != nil {
        log.Fatal("Database is unreachable:", err)
    }
    // Populate variables with data
    err = populateVars()
    checkErr(err)
    // Marshal variables into JSON
    binaryJSON, err := json.Marshal(collection)
    checkErr(err)
    // Write JSON to a file
    err = writeStringToFile("/home/user01/Temporary/sql2data.json", string(binaryJSON))
    checkErr(err)
}

The problem is that the output is inconsistent.

Every time I run the program, the resulting file has a different size and some random fields are missing.

What could be causing this?

It doesn't seem like it's a problem with the logic of the program since everything executes without errors, and most fields are populated just fine.

Could I be reading the information too fast, so that some things get lost occasionally?

Or is there something else that I'm missing?

Edit:

Most of the work happens inside the populateVars() function call.

It has multiple blocks of code that execute a given SQL query and populate struct variables according to the schema.

This is one such block:

rows, err = db.Query("SELECT emp_no, dept_emp.dept_no, dept_name, from_date, to_date FROM dept_emp JOIN departments ON departments.dept_no = dept_emp.dept_no;")
checkErr(err)
i := 0
for rows.Next() {
    var id int
    var depNumber string
    var depName string
    var fromDate string
    var toDate string
    var position = "Employee"
    err = rows.Scan(&id, &depNumber, &depName, &fromDate, &toDate,)
    // For debugging purposes:
    fmt.Println(id, depNumber, depName, fromDate, toDate, position, i)
    if err != nil {
        return err
    }
    for i := range collection {
        if collection[i].ID == id {
            collection[i].Departments = append(collection[i].Departments, Department{DepartmentNumber: depNumber, DepartmentName: depName, FromDate: fromDate, ToDate: toDate, Position: position})
            // For debugging purposes:
            fmt.Println(collection[i].Departments)
        }
    }
    i++
}

Here's a GitHub link to the whole program: https://github.com/dchmie01/mysql_to_json/blob/master/main.go

Edit 2:

It seems like the issue has to do with query timeout.

Each query takes about 10 min to execute but at about 6 minutes in, I get this error, and the program stops executing the query:

[mysql] 2017/04/29 17:35:16 packets.go:66: unexpected EOF
[mysql] 2017/04/29 17:35:16 packets.go:412: busy buffer
2017/04/29 17:35:16 driver: bad connection

And in the MySQL log file it says:

2017-04-29T16:28:49.975805Z 102 [Note] Aborted connection 102 to db: 'employees' user: 'root' host: 'localhost' (Got timeout writing communication packets)

So far I tried playing around with MySQL variables to disable any timeouts that might be present, but no luck.

I think the issue might be with the mysql driver for Go.

  • 写回答

1条回答 默认 最新

  • doupapin172773 2017-05-02 11:28
    关注

    Consider to use Mysql SELECT INTO OUTFILE and mongoiport --type csv instead.

    The only thing that the program does is embedding 1-to-many and many-to-many documents, which can be easily done with aggregation framework.

    A step-by step example:

    1. export csv from mysql

      SELECT * from employees INTO OUTFILE '/tmp/employees.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
      SELECT * from salaries INTO OUTFILE '/tmp/salaries.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
      SELECT * from titles INTO OUTFILE '/tmp/titles.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
      SELECT * from departments INTO OUTFILE '/tmp.departments.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
      SELECT * from dept_emp INTO OUTFILE '/tmp/dept_emp.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
      SELECT * from dept_manager INTO OUTFILE '/tmp/dept_manager.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
      
    2. import csv into mongo (define 'field spec' according to your schema, see example for employees field spec)

      mongoimport -d <dbname> -c tmp_employees -f 'id.int32(),birth.date(2006-01-02),first_name.string(),last_name.string(),gender.string(),hire_date.date(2006-01-02)' --columnsHaveTypes --type csv --file /tmp/employees.csv --drop 
      mongoimport -d <dbname> -c tmp_salaries -f 'field spec' --columnsHaveTypes --type csv --file /tmp/salaries.csv --drop 
      mongoimport -d <dbname> -c tmp_titles -f 'field spec' --columnsHaveTypes --type csv --file /tmp/titles.csv --drop 
      mongoimport -d <dbname> -c tmp_departments -f 'field spec' --columnsHaveTypes --type csv --file /tmp/departments.csv --drop 
      mongoimport -d <dbname> -c tmp_dept_emp -f 'field spec' --columnsHaveTypes --type csv --file /tmp/dept_emp.csv --drop 
      mongoimport -d <dbname> -c tmp_dept_manager -f 'field spec' --columnsHaveTypes --type csv --file /tmp/dept_manager.csv --drop 
      
    3. embed data from mongo shell

      db.tmp_employees.aggregate([
          // 1-to-many joins
          {$lookup: {
              from: 'tmp_salaries',
              localField: 'id',
              foreignField: 'emp_no',
              as: 'salaries'
          }},
          {$lookup: {
              from: 'tmp_titles',
              localField: 'id',
              foreignField: 'emp_no',
              as: 'titles'
          }},
          // many-to-many joins
          {$lookup: {
              from: 'tmp_dept_emp',
              localField: 'id',
              foreignField: 'emp_no',
              as: 'dept_emp'
          }},
          {$lookup: {
              from: 'tmp_dept_manager',
              localField: 'id',
              foreignField: 'emp_no',
              as: 'dept_manager'
          }},
          {$unwind: { path: '$dept_emp', preserveNullAndEmptyArrays: true }},
          {$lookup: {
              from: 'tmp_departments',
              localField: 'dept_emp.dept_no',
              foreignField: 'dept_no',
              as: 'dept_emp_deps'
          }},    
          {$unwind: { path: '$dept_emp_deps', preserveNullAndEmptyArrays: true }},
          {$group: {
              _id: '$_id',
              root: {$first: '$$ROOT'},
              dept_manager: {$first: '$dept_manager'},
              departments_emp: {$push: {
                  department_number: '$dept_emp.emp_no',
                  department_name: '$dept_emp_deps.dept_name',
                  from_date: '$dept_emp.from_date',
                  to_date: '$dept_emp.to_date',
                  position: '$dept_emp.position'
              }},
          }},
          {$unwind: { path: '$dept_manager', preserveNullAndEmptyArrays: true }},
          {$lookup: {
              from: 'tmp_departments',
              localField: 'dept_manager.dept_no',
              foreignField: 'dept_no',
              as: 'dept_manager_deps'
          }},    
          {$unwind: { path: '$dept_manager_deps', preserveNullAndEmptyArrays: true }},
          {$group: {
              _id: '$_id',
              root: {$first: '$root'},
              departments_emp: {$first: '$departments_emp'},
              departments_manager: {$push: {
                  department_number: '$dept_manager.emp_no',
                  department_name: '$dept_manager_deps.dept_name',
                  from_date: '$dept_manager.from_date',
                  to_date: '$dept_manager.to_date',
                  position: '$dept_manager.position'
              }},
          }},
          // combine departments to a single array
          {$project: {
              root: 1,
              departments_all: {$concatArrays: [ "$departments_emp", "$departments_manager" ] }
          }},
          //final reshape
          {$project: {
              id: '$root.id',
              birth_date: '$root.birth_date',
              first_name: '$root.first_name',
              last_name: '$root.last_name',
              gender: '$root.gender',
              hire_date: '$root.hire_date',
              salaries: '$root.salaries',
              titles: '$root.titles',
              departments: {$filter: {
                  input: "$departments_all",
                  as: "departments",
                  cond: { $ne: [ "$$departments", {} ] }}}
          }},
          { $out : "employees" }
      ])
      
    4. delete imported collections from mongo shell

      db.tmp_employees.drop();
      db.tmp_salaries.drop();
      db.tmp_titles.drop();
      db.tmp_departments.drop();
      db.tmp_dept_emp.drop();
      db.tmp_dept_manager.drop();
      
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 对于相关问题的求解与代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料