Text analytics reveal thirty two percent of comments on hive are not unique and at least ten percent add no value to discussion

image.png

In a past, professional life, I once did a task that examined the quality and occurrence of text. No on asked for it. I was a Business Analyst who kept seeing the same comments come up, and I was concerned that poor quality notes were being left on customer accounts.

I was trying to ascertain how good complaint resolution notes left by on customer cases were based on their length, uniqueness, and frequency. Now that I find myself temporarily unemployed, I thought it would be fun (if you can call data fun - I do) to create a study on HIVE comments, and to do some objective analysis on the comments left on HIVE.

Because I am feeling lazy for this analysis, I am using power query and Excel, so I'll include the step by step methodology as I go.

Firstly, some parameters about the data used:

  • Extracted from HIVE SQL, I am looking at the Comments table.

= DBHive{[Schema="dbo",Item="Comments"]}[Data]

  • I am then looking only for a week worth of content

= Table.SelectRows(dbo_Comments, each [created] >= #datetime(2025, 5, 18, 0, 0, 0) and [created] <= #datetime(2025, 5, 24, 0, 0, 0))

  • I am interested only in comments, not top level posts. Therefore I am filtering OUT content that does not have a parent author. I'm also keeping everything with a "blank" title, as this appears to get me actual comments.

= Table.SelectRows(#"Filtered Rows", each ([parent_author] <> null and [parent_author] <> "") and ([title] = ""))

This leaves me with 98,655 comments to work with as a sample set, looking at a period of a week. The first thing I want to check the integrity of the data, and given that I know my own data best, let me test to see what I've been doing and who I've been talking to most on the blockchain in the last week:

holoz0r replies to userthis many times
riverflows16
galenkp8
cryptoandcoffee4
jorgebgt3
creativemary3
abh123453
meno3
hivewatchers3
azircon3
fastchrisuk2
mattclarke2
acidyo2
steevc2
menati2
beatminister2
raceline2
buggedout2
vatman2
edicted2
vimukthi2

Looks about right, given that I know my activity.

So my next step is to figure out which account did the most replies in the sampled period. (because, as we all should know by now, not every account is a human, and it is pretty obvious on the basis of some of the account names that appear in the list.

The next thing I want to learn about is users who are not me, because they are typically more interesting than myself. The thing I love about data is that data hides absolutely nothing, and we can see that there is a lot of bots or tokens...

User making commentcount of comments
hivebuzz3634
lolzbot2000
actifit988
worldmappin940
luvshares822
beerlover700
splinterboost621
pizzabot616
ladytoken596
bpcvoter1452
roswelborges448
aquarius.academy448
chi4god442
hug.bot435
hivebits418
u89gw415
xcv47413
w7ngc412
jkl65411
w95hj409
sor31409
hk14d407
fgh87407
asd09407
f76wz405
vmn31404
dw38h404
wiv01403
x6oc5402
zxc43401

What I am interested in next is probably a futile exercise, but I want to know what the most commonly left ... comment is and what percentage that IDENTICAL comment makes up of all the comments left during the week.

I am pleased to report that this simple analysis reveals that:

Over 10% of the comments left on HIVE comments are entirely meaningless

Data doesn't lie. Here are the top 100 most commonly left comments.

image.png

Furthermore, once I exclude non-duplicate comments, we find that 32,068 of the comments left on HIVE for the week are non-unique. Therefore, from our original sample of 98,655 comments, a whopping 32.5% of comments left on the HIVE blockchain are NOT UNIQUE!

This means, on aggregate, for every comment that you see on HIVE, about one in three will be the same. Context is important though, therefore we've got to consider common phrases that appear at the top of the list:

When I look through the duplicate comments, I can see that we're a grateful bunch, with the string "thank" appearing in 12,861 comments, or 13% of replies.

I plan on interrogating this data in more depth, but I think this is a good starting point to build a future "dashboard" of comment health on HIVE.

What would you like to see in such a dashboard?

My thoughts are as follows:

  • Is the comment unique?
  • How many comments by x user?
  • Who swears the most?
  • What comments are just calling bots to give tokens?
  • Is the comment longer or shorter than the average comment?
  • Who on had the most interactions with who?
  • Does the comment contain picture(s)?

Open to suggestions. Give me stuff to do.

H2
H3
H4
Upload from PC
Video gallery
3 columns
2 columns
1 column
54 Comments