doubo4336 2018-05-07 11:44
浏览 78

PHP - 解析并验证文本文件数据并将其导入mysql数据库

I'm using php-resque to parse and validate data in large files then import that data into a mysql database.

I already know that LOAD DATA INFILE can be used to read rows from a text file into a table, but doesn't perform any validation whatsoever.

My database structure:

ItemsFile Table:

id  filename  fileepath  valid_items invalid_items processed_items  processed

Item Table:

id  uid  item file_id created_at

My Resque Job Class looks like this:

php-resque forks a child process and instantiates ItemsFileProcessor class then

  1. setUp() gets called
  2. perform() gets called
/**
* Read and validate items form a file, and store them in a database.
*/

class ItemsFileProcessor {

    //ItemsFile Model instance
    private $items_file = null;

    //Item Model instance
    private $item = null;

    //retrieved from ItemsFile table.
    private $file = null;

    public function __construct() {
        $this->items_file = new ItemsFile();
        $this->item       = new Item();
    }

    public function setUp() {

        if (isset($this->args['file_id'])) {

            //get file from ItemsFile Table by id.
            $this->file = $this->items_file->getFile($this->args['file_id']);

            if (empty($this->file)) {

                //End job processing if file does not exist.
                exit(-1);

            }

        }
    }

    function perform() {

        //NodeJs, socket.io, redis, broadcasting system
        EventBroadcaster::broadcast('app-jobs-channel', 'file_processing_started');

        $processed_items = 0;
        $valid_items     = 0;
        $invalid_items   = 0;

        //item validation class instance
        $item_validator = new ItemValidator();

        try {

            $tmp_file = new SplFileObject($this->file->filepath);

            //Read items from file, and validate each item.
            while ($tmp_file->valid()) {
                $line = trim($tmp_file->fgets());
                if ($line !== '') {
                    if ($item_validator->isValid($line, new ItemValidationRule())) {

                        //store item in Item table.
                        $this->item->create([
                                'uid'     => 'foo',
                                'item'    => $line,
                                'file_id' => $this->file->id,
                            ]);

                        $valid_items++;

                    } else {

                        $invalid_items++;

                    }

                    $processed_items++;

                }
            }

            //update ItemsFile Table record
            $this->items_file->update(
                $this->file->id,
                [
                    'processed_items'  => $processed_items,
                    'valid_items'      => $valid_items,
                    'invalid_items'    => $invalid_items,
                    'processed'        => 'Processed',
                ]
            );

            EventBroadcaster::broadcast('app-jobs-channel', 'file_processing_completed');

        } catch (LogicException $exception) {

            //broadcast failure.
            EventBroadcaster::broadcast('app-jobs-channel', 'file_processing_failed');
            Logger::getInstance()->log('ProcessContactFile Exception: '.$exception->getMessage(), Logger::LOGTYPE_ERROR);

            exit(-1);

        }
    }

}

My Problems:

  • Processing a file takes too long
  • Mysql has to process all the insert request one by one. LOAD DATA INFILE is much faster.

My Question:

Is there a way to optimize this or maybe introduce LOAD DATA INFILE somehow.

  • 写回答

1条回答 默认 最新

  • dpdhsq0783 2018-05-07 12:32
    关注

    You can have many problems of performance while managing files with PHP. Then, I suggest you to do it with SHELL to parse the file and return one string (that represents your general request with all your inserts). From now you just have to execute this request.

    Can help if isn't clear.

    评论

报告相同问题?

悬赏问题

  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 划分vlan后不通了
  • ¥15 GDI处理通道视频时总是带有白色锯齿
  • ¥20 用雷电模拟器安装百达屋apk一直闪退
  • ¥15 算能科技20240506咨询(拒绝大模型回答)
  • ¥15 自适应 AR 模型 参数估计Matlab程序
  • ¥100 角动量包络面如何用MATLAB绘制
  • ¥15 merge函数占用内存过大
  • ¥15 使用EMD去噪处理RML2016数据集时候的原理
  • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大