In a past, professional life, I once did a task that examined the quality and occurrence of text. No on asked for it. I was a Business Analyst who kept seeing the same comments come up, and I was concerned that poor quality notes were being left on customer accounts.
I was trying to ascertain how good complaint resolution notes left by on customer cases were based on their length, uniqueness, and frequency. Now that I find myself temporarily unemployed, I thought it would be fun (if you can call data fun - I do) to create a study on HIVE comments, and to do some objective analysis on the comments left on HIVE.
Because I am feeling lazy for this analysis, I am using power query and Excel, so I'll include the step by step methodology as I go.
Firstly, some parameters about the data used:
- Extracted from HIVE SQL, I am looking at the Comments table.
= DBHive{[Schema="dbo",Item="Comments"]}[Data]
- I am then looking only for a week worth of content
= Table.SelectRows(dbo_Comments, each [created] >= #datetime(2025, 5, 18, 0, 0, 0) and [created] <= #datetime(2025, 5, 24, 0, 0, 0))
- I am interested only in comments, not top level posts. Therefore I am filtering OUT content that does not have a parent author. I'm also keeping everything with a "blank" title, as this appears to get me actual comments.
= Table.SelectRows(#"Filtered Rows", each ([parent_author] <> null and [parent_author] <> "") and ([title] = ""))
This leaves me with 98,655 comments to work with as a sample set, looking at a period of a week. The first thing I want to check the integrity of the data, and given that I know my own data best, let me test to see what I've been doing and who I've been talking to most on the blockchain in the last week:
holoz0r replies to user | this many times |
---|---|
riverflows | 16 |
galenkp | 8 |
cryptoandcoffee | 4 |
jorgebgt | 3 |
creativemary | 3 |
abh12345 | 3 |
meno | 3 |
hivewatchers | 3 |
azircon | 3 |
fastchrisuk | 2 |
mattclarke | 2 |
acidyo | 2 |
steevc | 2 |
menati | 2 |
beatminister | 2 |
raceline | 2 |
buggedout | 2 |
vatman | 2 |
edicted | 2 |
vimukthi | 2 |
Looks about right, given that I know my activity.
So my next step is to figure out which account did the most replies in the sampled period. (because, as we all should know by now, not every account is a human, and it is pretty obvious on the basis of some of the account names that appear in the list.
The next thing I want to learn about is users who are not me, because they are typically more interesting than myself. The thing I love about data is that data hides absolutely nothing, and we can see that there is a lot of bots or tokens...
User making comment | count of comments |
---|---|
hivebuzz | 3634 |
lolzbot | 2000 |
actifit | 988 |
worldmappin | 940 |
luvshares | 822 |
beerlover | 700 |
splinterboost | 621 |
pizzabot | 616 |
ladytoken | 596 |
bpcvoter1 | 452 |
roswelborges | 448 |
aquarius.academy | 448 |
chi4god | 442 |
hug.bot | 435 |
hivebits | 418 |
u89gw | 415 |
xcv47 | 413 |
w7ngc | 412 |
jkl65 | 411 |
w95hj | 409 |
sor31 | 409 |
hk14d | 407 |
fgh87 | 407 |
asd09 | 407 |
f76wz | 405 |
vmn31 | 404 |
dw38h | 404 |
wiv01 | 403 |
x6oc5 | 402 |
zxc43 | 401 |
What I am interested in next is probably a futile exercise, but I want to know what the most commonly left ... comment is and what percentage that IDENTICAL comment makes up of all the comments left during the week.
I am pleased to report that this simple analysis reveals that:
Over 10% of the comments left on HIVE comments are entirely meaningless
Data doesn't lie. Here are the top 100 most commonly left comments.
Furthermore, once I exclude non-duplicate comments, we find that 32,068 of the comments left on HIVE for the week are non-unique. Therefore, from our original sample of 98,655 comments, a whopping 32.5% of comments left on the HIVE blockchain are NOT UNIQUE!
This means, on aggregate, for every comment that you see on HIVE, about one in three will be the same. Context is important though, therefore we've got to consider common phrases that appear at the top of the list:
When I look through the duplicate comments, I can see that we're a grateful bunch, with the string "thank" appearing in 12,861 comments, or 13% of replies.
I plan on interrogating this data in more depth, but I think this is a good starting point to build a future "dashboard" of comment health on HIVE.
What would you like to see in such a dashboard?
My thoughts are as follows:
- Is the comment unique?
- How many comments by x user?
- Who swears the most?
- What comments are just calling bots to give tokens?
- Is the comment longer or shorter than the average comment?
- Who on had the most interactions with who?
- Does the comment contain picture(s)?
Open to suggestions. Give me stuff to do.