Jura The idiot
General
first of all, it's meant to be fun
only last weekend (after more than three and a half years) I looked carefully at the data under
Forum Statistics
and
Notable Members
for the first time, and I was surprised how steep the distribution is;
https://www.sinodefenceforum.com/members/
shows only top twenty posters, so I felt challenged to model the remaining (and missing) almost ten thousand LOL!
from now on, all blame is on me:
I collected the data (* below) (I'll mark like technical stuff with asterisks and put them further below in this post) in the morning on Monday (just for the case if somebody looked here in the future: March 13, 2016), but then didn't have time for this, anyway the hard data show the first 20 posters (out of the total of 9616) made 41% of all posts! (163268/402906=0.4052) and while guessing about the rest, I skipped what would lead to, imprecisely called (if you nitpick, I'll ignore), Pareto distribution because:
where I glued the hard data (the first twenty points; #1 is Jeff with 21749) to the prediction which
if spread just like 17929/9456, it would mean less than two posts on average per a user with # above 160 (but this would be very unfair to #161 I guess
(I know this exponential wouldn't work well in top 20 region (*** below) but I obviously don't need it there)
the blue part of the above graph is related to the cumulative distribution (it's obvious the points end at 0.956, not 1.0, because 384977/402906)
now a question: I'm in no other forum, so I don't know if this is typical? I mean a tiny fraction of members making almost all of the posts
* the data:
(**)
the power law (Pareto distribution) would require (an approximate) straight line in the whole range in:
(in the process I reread an awesome description in
Power Laws, Pareto Distributions and Zipf's law
where you can see what billionaires share with craters on the Moon
(***)
as I said, this exponential wouldn't work well in like buildup region:
where it would account for only 79% (128978 instead of 163268)
am I obsessed with numbers? LOL
only last weekend (after more than three and a half years) I looked carefully at the data under
Forum Statistics
and
Notable Members
for the first time, and I was surprised how steep the distribution is;
https://www.sinodefenceforum.com/members/
shows only top twenty posters, so I felt challenged to model the remaining (and missing) almost ten thousand LOL!
from now on, all blame is on me:
I collected the data (* below) (I'll mark like technical stuff with asterisks and put them further below in this post) in the morning on Monday (just for the case if somebody looked here in the future: March 13, 2016), but then didn't have time for this, anyway the hard data show the first 20 posters (out of the total of 9616) made 41% of all posts! (163268/402906=0.4052) and while guessing about the rest, I skipped what would lead to, imprecisely called (if you nitpick, I'll ignore), Pareto distribution because:
- I don't think there's so called Pareto tail here (which would mean several thousand posters making let's say more than one hundred posts) PLUS
- I would have to draw a straight line (** below) through a really small set (of just those 20 data)
where I glued the hard data (the first twenty points; #1 is Jeff with 21749) to the prediction which
- starts at (an unknown member of course) #21 with 4454 (#20
siegecrossbow
was 4458, so this distribution is smooth, by the way you may call it SDF distribution if you want
- ends at #160 with 88, while
- leaves the rest of posts (and I did some tweaking to minimize this rest) for the remaining 9456 (9616-160) members;
if spread just like 17929/9456, it would mean less than two posts on average per a user with # above 160 (but this would be very unfair to #161 I guess
(I know this exponential wouldn't work well in top 20 region (*** below) but I obviously don't need it there)
the blue part of the above graph is related to the cumulative distribution (it's obvious the points end at 0.956, not 1.0, because 384977/402906)
now a question: I'm in no other forum, so I don't know if this is typical? I mean a tiny fraction of members making almost all of the posts
* the data:
%top 20:
21749
20763
9879
9106
8810
8456
8425
8284
7524
6731
6397
6195
6114
5814
5697
4999
4852
4545
4470
4458
%
%the sum of top 20:
163268
%
%the total number of posts:
402906
%
%the total number of members:
9616
(**)
the power law (Pareto distribution) would require (an approximate) straight line in the whole range in:
(in the process I reread an awesome description in
Power Laws, Pareto Distributions and Zipf's law
where you can see what billionaires share with craters on the Moon
(***)
as I said, this exponential wouldn't work well in like buildup region:
where it would account for only 79% (128978 instead of 163268)
am I obsessed with numbers? LOL