To get percentage, we need to get some counts, and then we can calculate percentages. Seems like a query of this form would get us the result we are after:
SELECT v.cnt_zero / v.cnt_all * 100.0 AS pct_zero
, v.cnt_one / v.cnt_all * 100.0 AS pct_one
, v.cnt_two / v.cnt_all * 100.0 AS pct_two
FROM (
...
) v
To get the counts, that looks like a job for conditional aggregation.
We can use an expression to check a condition, and return a 1 or 0.
SELECT IF( t.field_couleurs_badge_value = '0' ,1,0)
, IF( t.field_couleurs_badge_value = '1' ,1,0)
, IF( t.field_couleurs_badge_value = '2' ,1,0)
, IF( t.field_couleurs_badge_value IN ('0','1','2') ,1,0)
FROM badge_stats t
WHERE ...
And we can use SUM
aggregate on those expressions to add up the 1s and 0s.
If we want to count rows with any value of in the field_couleurs_badge_value
in "all" rows (including rows that don't have a '0','1' or '2' in that column), we could do that as well.
For the timestamp value, we can convert that into a MySQL TIMESTAMP or DATETIME datatype, and use the convenient builtin functions.
The FROM_UNIXTIME
function will convert a unix-style timestamp value (32-bit integer seconds from beginning of era) into a DATETIME in the current MySQL session time zone (SHOW VARIABLES LIKE 'time_zone'
)
FROM_UNIXTIME( 1491581853 )
We can check if the time is "between 08:00 and 19:00". We'll take this to mean that the check should be inclusive of '08:00:00.000' up thru and including '19:00:00.000'.
Normally when we work with time ranges, we avoid overlaps by doing checks >= start and < end, so that we don't ever have a time on a boundary that gets included in two separate ranges. For example, does 19:00:00 belong in the range 8am-7pm or does it belong in the range 7pm-11pm. (When we're counting rows in ranges, we typically don't want to count a row in two distinct ranges.)
We can use the HOUR
function or DATE_FORMAT
function to extract the hours, or the time. My preference would be to pull the whole time hh:mm:ss
DATE_FORMAT( FROM_UNIXTIME(1491581853), '%T')
And the string returned by that can compared to literals to see if the time falls into a time range:
DATE_FORMAT( FROM_UNIXTIME(1491581853), '%T') >= '08:00:00'
AND DATE_FORMAT( FROM_UNIXTIME(1491581853), '%T') <= '19:00:00'
Similarly, to "exclude the weekends", we can use the WEEKDAY
function or the DATE_FORMAT
function to extract the day of the week
AND WEEKDAY( FROM_UNIXTIME(1491581853), '%T') < 5
Putting all of that together, we get something like this:
SELECT v.cnt_zero / v.cnt_012 * 100.0 AS pct_zero
, v.cnt_one / v.cnt_012 * 100.0 AS pct_one
, v.cnt_two / v.cnt_012 * 100.0 AS pct_two
, v.cnt_012 / v.cnt_all * 100.0 AS pct_012
FROM (
SELECT SUM(IF( t.field_couleurs_badge_value = '0' ,1,0)) AS cnt_zero
, SUM(IF( t.field_couleurs_badge_value = '1' ,1,0)) AS cnt_one
, SUM(IF( t.field_couleurs_badge_value = '2' ,1,0)) AS cnt_two
, SUM(IF( t.field_couleurs_badge_value IN ('0','1','2') ,1,0)) AS cnt_012
, SUM(1) AS cnt_all
FROM badge_stats t
WHERE DATE_FORMAT(FROM_UNIXTIME( t.revision_timestamp +0),'%T') >= '08:00:00'
AND DATE_FORMAT(FROM_UNIXTIME( t.revision_timestamp +0),'%T') <= '19:00:00'
AND WEEKDAY( FROM_UNIXTIME( t.revision_timestamp +0) ) < 5
) v
The specification is a little unclear what we are supposed to do with rows with values other than '0','1' and '2'... whether those are supposed to be included in the count of "all". As the specification is refined, we can tweak the query.
https://dev.mysql.com/doc/refman/5.6/en/date-and-time-functions.html#function_from-unixtime
https://dev.mysql.com/doc/refman/5.6/en/date-and-time-functions.html#function_date-format
https://dev.mysql.com/doc/refman/5.6/en/date-and-time-functions.html#function_weekday
To get the percentages for various date ranges, those conditions can be included in the conditionals in the aggregate expressions.
First we need to have expressions that return us the begin and end of the datetime ranges we want to check. These definitions may not suit your use case, but as an example...
SELECT DATE(NOW()) - INTERVAL WEEKDAY(DATE(NOW()))+8 DAY AS blw -- begin last week
, DATE(NOW()) - INTERVAL WEEKDAY(DATE(NOW()))+1 DAY AS elw -- end last week (same as begin this week)
, DATE_FORMAT(NOW(),'%Y-%m-01') - INTERVAL 1 MONTH AS blm -- begin last month
, DATE_FORMAT(NOW(),'%Y-%m-01') - INTERVAL 0 MONTH AS elm -- end last month (same as begin this month)
We can test those expressions (use a user-defined variable in place of NOW()
so we test behavior with various datetimes, e.g. a sunday, on the first of the month, etc.) and tweak the expressions until we get the behavior we want.
blw elw blm elm
---------- ---------- ---------- ----------
2017-10-15 2017-10-22 2017-09-01 2017-10-01
We can use a query like that as an inline view, so we can reference the values by the assigned alias. Given that the datetimes in the table are unix-style timestamps (why are these stored as character, and not int?), we can convert MySQL DATETIME/TIMESTAMP into unix-style timestamp integers with UNIX_TIMESTAMP
function.
Shortening the column names for the purposes of this demonstration, revision_timestamp
to ts
, and field_couleurs_badge_value
to fcbv
, something like this:
SELECT v.lw_cnt_zero / v.lw_cnt_012 * 100.0 AS lw_pct_zero
, v.lw_cnt_one / v.lw_cnt_012 * 100.0 AS lw_pct_one
, v.lw_cnt_two / v.lw_cnt_012 * 100.0 AS lw_pct_two
, v.lm_cnt_zero / v.lm_cnt_012 * 100.0 AS lm_pct_zero
, v.lm_cnt_one / v.lm_cnt_012 * 100.0 AS lm_pct_one
, v.lm_cnt_two / v.lm_cnt_012 * 100.0 AS lm_pct_two
FROM ( -- last week
SELECT SUM(IF( t.ts+0 >= d.blw AND t.ts+0 < d.elw AND t.fcbv = '0' ,1,0)) AS lw_cnt_zero
, SUM(IF( t.ts+0 >= d.blw AND t.ts+0 < d.elw AND t.fcbv = '1' ,1,0)) AS lw_cnt_one
, SUM(IF( t.ts+0 >= d.blw AND t.ts+0 < d.elw AND t.fcbv = '2' ,1,0)) AS lw_cnt_two
, SUM(IF( t.ts+0 >= d.blw AND t.ts+0 < d.elw AND t.fcbv IN ('0','1','2'),1,0)) AS lw_cnt_012
-- last month
, SUM(IF( t.ts+0 >= d.blm AND t.ts+0 < d.elm AND t.fcbv = '0' ,1,0)) AS lm_cnt_zero
, SUM(IF( t.ts+0 >= d.blm AND t.ts+0 < d.elm AND t.fcbv = '1' ,1,0)) AS lm_cnt_one
, SUM(IF( t.ts+0 >= d.blm AND t.ts+0 < d.elm AND t.fcbv = '2' ,1,0)) AS lm_cnt_two
, SUM(IF( t.ts+0 >= d.blm AND t.ts+0 < d.elm AND t.fcbv IN ('0','1','2'),1,0)) AS lm_cnt_012
FROM ( SELECT UNIX_TIMESTAMP(DATE(NOW()) - INTERVAL WEEKDAY(DATE(NOW()))+8 DAY) AS blw -- begin last week
, UNIX_TIMESTAMP(DATE(NOW()) - INTERVAL WEEKDAY(DATE(NOW()))+1 DAY) AS elw -- end last week
, UNIX_TIMESTAMP(DATE_FORMAT(NOW(),'%Y-%m-01') - INTERVAL 1 MONTH) AS blm -- begin last month
, UNIX_TIMESTAMP(DATE_FORMAT(NOW(),'%Y-%m-01') - INTERVAL 0 MONTH) AS elm -- end last month
) d
JOIN badge_stats t
ON t.ts+0 >= d.blm
WHERE DATE_FORMAT(FROM_UNIXTIME( t.ts +0),'%T') >= '08:00:00'
AND DATE_FORMAT(FROM_UNIXTIME( t.ts +0),'%T') <= '19:00:00'
AND WEEKDAY( FROM_UNIXTIME( t.ts +0) ) < 5
) v
Note that the comparisons to the "end" of the range is "less than". The value we're comparing to is actually the beginning datetime of the next range. We want values up to (but not including) the beginning of the next range.
Note that we included a predicate in the ON clause of the JOIN to exclude rows from badge_stats
that are earlier than the earliest datetime we're interested in, in this case, the beginning of last month.
We are also adding a zero (+0) to the references to the varchar regression_timestamp
column, to convert to numeric, so we can compare to numerics. (We don't want to do character comparisons on integer values.)
This can be extended to also generate percentages for "yesterday". Add expressions to the inline view d
to return the beginning of yesterday and the start of today.
And repeat the same pattern we used for "last week" and "last month", with expressions referencing "beginning of yesterday" and "end of yesterday".