douzhuang1900 2014-04-15 09:49
浏览 47

Cassandra DB中的日期插入:非平凡的1h班次问题

I am a bit desperate about this problem... I have no idea how to face it.

Here is a simpler way to look at this problem:

If my insert cql query is:

"BEGIN BATCH USING CONSISTENCY ONE insert into my_table(id,'2014-04-11 8:00:00',...,'2014-04-15 10:00:00') values ('2036548',3.15,...,4.11) APPLY BATCH"

...and my data request cql query is:

"Select FIRST 100000 '2014-04-01 0:00:00'..'2014-04-16 0:00:00' from my_table where id=2036548"

...why does the inserted date 2014-04-15 10:00:00 changes to 2014-04-15 11:00:00 when pullling it from Cassandra?

The date pulling code in vb.net is:

Public Shared Function getCassandraDate(ByVal value As Byte()) As Date
    Dim buffer As Byte() = New Byte(value.Length - 1) {}
    value.CopyTo(buffer, 0)
    Array.Reverse(buffer)
    Dim ticks As Long = BitConverter.ToInt64(buffer, 0)
    Dim dateTime As New System.DateTime(1970, 1, 1, 0, 0, 0, _
         0)

    dateTime = dateTime.AddMilliseconds(ticks)
    Return dateTime.ToLocalTime
End Function

...same thing in PHP:

date_default_timezone_set("Europe/Paris");
$time = $this->unpackDate($packed_time);
$str_time = date('Y-m-d H:i:s',$time); //TODO : to local time

private function unpackDate($data, $is_name=null)
{
    $arr = unpack('N2', $data);

    // If we are on a 32bit architecture we have to explicitly deal with
    // 64-bit twos-complement arithmetic since PHP wants to treat all ints
    // as signed and any int over 2^31 - 1 as a float
    if (PHP_INT_SIZE == 4) {

        $hi = $arr[1];
        $lo = $arr[2];
        $isNeg = $hi  < 0;

        // Check for a negative
        if ($isNeg) {
            $hi = ~$hi & (int)0xffffffff;
            $lo = ~$lo & (int)0xffffffff;

            if ($lo == (int)0xffffffff) {
                $hi++;
                $lo = 0;
            } else {
                $lo++;
            }
        }

        // Force 32bit words in excess of 2G to pe positive - we deal wigh sign
        // explicitly below

        if ($hi & (int)0x80000000) {
            $hi &= (int)0x7fffffff;
            $hi += 0x80000000;
        }

        if ($lo & (int)0x80000000) {
            $lo &= (int)0x7fffffff;
            $lo += 0x80000000;
        }

        $value = $hi * 4294967296 + $lo;

        if ($isNeg)
            $value = 0 - $value;

    } else {
        // Upcast negatives in LSB bit
        if ($arr[2] & 0x80000000)
            $arr[2] = $arr[2] & 0xffffffff;

        // Check for a negative
        if ($arr[1] & 0x80000000) {
            $arr[1] = $arr[1] & 0xffffffff;
            $arr[1] = $arr[1] ^ 0xffffffff;
            $arr[2] = $arr[2] ^ 0xffffffff;
            $value = 0 - $arr[1]*4294967296 - $arr[2] - 1;
        } else {
            $value = $arr[1]*4294967296 + $arr[2];
        }
    }
    return $value / 1e3;
}

MORE DETAILS

Processing chain:

(1). insertion to Cassandra through .NET

(2). Cassandra data storage

(3). Pulling the data from PHP or .NET

Problem:

As for today, a date being 2014-04-15 10:00:00 in step (1), will come out as 2014-04-15 11:00:00 in step (3).

Details:

(regarding the date format in this chain)

(1). Local time in .NET (Timezone: "Europe/Paris"). Insertion cql that is being executed: "BEGIN BATCH USING CONSISTENCY ONE insert into my_table(id,'2014-04-11 8:00:00',...,'2014-04-15 10:00:00') values ('2036548',3.15,...,4.11) APPLY BATCH"

(2). ??? I don't know what Cassandra does here... ???

(3). Example of cql query to pull the data: "Select FIRST 100000 '2014-04-01 0:00:00'..'2014-04-16 0:00:00' from my_table where id=2036548". In php: date_default_timezone_set("Europe/Paris"); $str_time = date('Y-m-d H:i:s',$time);. In .NET: dateTime.ToLocalTime.

Extra info:

I think it worked well before the daylight saving time change some weeks ago. But I can not be sure about that.

If in step (1), if I changed the date to de date to UTC before inserting it, 2014-04-15 10:00:00 will become 2014-04-15 08:00:00 and the output will be 2014-04-15 09:00:00, which is still not correct.

I highly suspect that the trick here is between steps (1) and (2), that is to say, me not being able to understand how Cassandra treats dates.


Edit1:

@Ananth 's questions:

both cassandra and client run in the same datacenter?

It is complicated:

  • Insertion in .NET from server1, a different server from server-cassandra (datacenter).
  • PHP (to pull the data) running on server-cassandra.
  • .NET (to pull the data) running on server1, not on server-cassandra.
  • PHP and .NET pulling the same result.

Can you post your schema here?

Here it is

CREATE TABLE tsmeasures (
  id int PRIMARY KEY
) WITH
  comment='' AND
  comparator=timestamp AND
  read_repair_chance=0.100000 AND
  gc_grace_seconds=0 AND
  default_validation=double AND
  min_compaction_threshold=4 AND
  max_compaction_threshold=32 AND
  replicate_on_write='true' AND
  compaction_strategy_class='SizeTieredCompactionStrategy' AND
  compression_parameters:sstable_compression='SnappyCompressor';

Edit2:

After testing it step by step, this is the result:

  • real date : 2014-04-15 17:00:00 (localtime)
  • cql text : '2014-04-15 15:00:00' (to UTC, done through .NET)
  • PHP Cassandra Unpack of this date => $ticks = 1397577600 (*) The unpack is done with the piece of code shown before

Ticks converted (through http://www.epochconverter.com/ )

  • GMT: Tue,
  • 15 Apr 2014 16:00:00 GMT Your time zone: 4/15/2014 6:00:00 PM GMT+2

These results makes no sense to me...

More details:

cql insert:

"BEGIN BATCH USING CONSISTENCY ONE insert into tsmeasures(id,'2014-04-11 15:00:00',...,'2014-04-15 15:00:00') values ('2036548',0,...,4.85) APPLY BATCH"

cql fetch:

"SELECT '2014-04-10 16:00:00'..'2014-04-20 17:00:00' FROM tsmeasures WHERE id IN (2036548,2036479,2036174,650877)"

Thus '2014-04-15 15:00:00' is included in the range of the fetch, and I can identify it because it is the highest value.

I will keep digging...

  • 写回答

2条回答 默认 最新

  • duanpendan8067 2014-04-15 12:19
    关注

    Before Edit

    Is there a clock time sync problem between your client and cassandra? I would strictly recommend running NTP between your client and cassandra installation.

    Post Edit

    CREATE TABLE tsmeasures (
      id int PRIMARY KEY
    ) WITH
      comment='' AND
      comparator=timestamp AND
      read_repair_chance=0.100000 AND
      gc_grace_seconds=0 AND
      default_validation=double AND
      min_compaction_threshold=4 AND
      max_compaction_threshold=32 AND
      replicate_on_write='true' AND
      compaction_strategy_class='SizeTieredCompactionStrategy' AND
      compression_parameters:sstable_compression='SnappyCompressor';
    

    From what you have given , it looks like you are trying to get the insertion time .

    Your problem might be due to clients running in different clock cycles with respect to cassandra. Cassandra just places a unix timestamp for each write.

    So what is happening here from what i see.

    You write from client using timestamp X(datastax driver sets this insertion timestamp). Cassandra writes with X.

    You read with a timestamp Y. Cassandra tries to read with timestamp Y(So , as per your explanation, PHP client is there in a different location).

    Both a are bound to differ.

    Solution 1

    Try to have a global NTP between the entire set up so that client clock cycles are in sync with cassandra.

    Solution 2

    Insert a column named timestamp which is user driven and do a range scan based on that

    Solution 3

    Set the insertion time in DML operations.

    评论

报告相同问题?

悬赏问题

  • ¥15 使用ESP8266连接阿里云出现问题
  • ¥15 BP神经网络控制倒立摆
  • ¥20 要这个数学建模编程的代码 并且能完整允许出来结果 完整的过程和数据的结果
  • ¥15 html5+css和javascript有人可以帮吗?图片要怎么插入代码里面啊
  • ¥30 Unity接入微信SDK 无法开启摄像头
  • ¥20 有偿 写代码 要用特定的软件anaconda 里的jvpyter 用python3写
  • ¥20 cad图纸,chx-3六轴码垛机器人
  • ¥15 移动摄像头专网需要解vlan
  • ¥20 access多表提取相同字段数据并合并
  • ¥20 基于MSP430f5529的MPU6050驱动,求出欧拉角