SDF Distribution

first of all, it's meant to be fun :)

only last weekend (after more than three and a half years) I looked carefully at the data under
Forum Statistics
and
Notable Members
for the first time, and I was surprised how steep the distribution is;
https://www.sinodefenceforum.com/members/
shows only top twenty posters, so I felt challenged to model the remaining (and missing) almost ten thousand LOL!
from now on, all blame is on me:

I collected the data (* below) (I'll mark like technical stuff with asterisks and put them further below in this post) in the morning on Monday (just for the case if somebody looked here in the future: March 13, 2016), but then didn't have time for this, anyway the hard data show the first 20 posters (out of the total of 9616) made 41% of all posts! (163268/402906=0.4052) and while guessing about the rest, I skipped what would lead to, imprecisely called (if you nitpick, I'll ignore), Pareto distribution because:
  1. I don't think there's so called Pareto tail here (which would mean several thousand posters making let's say more than one hundred posts) PLUS
  2. I would have to draw a straight line (** below) through a really small set (of just those 20 data)
and instead I tried kinda exponential distribution; before you get bored, here is the graphics:
seISC.jpg

where I glued the hard data (the first twenty points; #1 is Jeff with 21749) to the prediction which
  • starts at (an unknown member of course) #21 with 4454 (#20
    siegecrossbow
    was 4458, so this distribution is smooth, by the way you may call it SDF distribution if you want :)
  • ends at #160 with 88, while
  • leaves the rest of posts (and I did some tweaking to minimize this rest) for the remaining 9456 (9616-160) members;
this rest is the total, 402906, minus the integral (the area in black in the above graph) of 384977; 402906-384977=17929
if spread just like 17929/9456, it would mean less than two posts on average per a user with # above 160 (but this would be very unfair to #161 I guess :)

(I know this exponential wouldn't work well in top 20 region (*** below) but I obviously don't need it there)

the blue part of the above graph is related to the cumulative distribution (it's obvious the points end at 0.956, not 1.0, because 384977/402906)

now a question: I'm in no other forum, so I don't know if this is typical? I mean a tiny fraction of members making almost all of the posts

* the data:
%top 20:
21749
20763
9879
9106
8810
8456
8425
8284
7524
6731
6397
6195
6114
5814
5697
4999
4852
4545
4470
4458
%
%the sum of top 20:
163268
%
%the total number of posts:
402906
%
%the total number of members:
9616

(**)
the power law (Pareto distribution) would require (an approximate) straight line in the whole range in:
kPxD.png


(in the process I reread an awesome description in
Power Laws, Pareto Distributions and Zipf's law
Please, Log in or Register to view URLs content!

where you can see what billionaires share with craters on the Moon :)


(***)
as I said, this exponential wouldn't work well in like buildup region:
69IdN.png

where it would account for only 79% (128978 instead of 163268)


am I obsessed with numbers? LOL
 

subotai1

Junior Member
Registered Member
first of all, it's meant to be fun :)

only last weekend (after more than three and a half years) I looked carefully at the data under
Forum Statistics
...
am I obsessed with numbers? LOL

Nice work Jura. What you have, and probably know, is what they call "the long tail" and its very common and seen many places. 20% of the people contribute 80% of the content or do 80% of something else. The rest of the work takes thousands.

So if you are interested in dumping this in to a repository and pointing R and some tensorflow at it, we can have some more fun with the numbers...
 

Webmaster

The Troll Hunter
Staff member
Administrator
Nice work... takes a while to wrap your head around it though. :)

When I first got notification about this in my email, i thought this must be some spammer... lol!!!

I see the retired folks right up there and then other 18% following with major contribution while rest of the 80% just follow along slow and steady.
 

Air Force Brat

Brigadier
Super Moderator
first of all, it's meant to be fun :)

only last weekend (after more than three and a half years) I looked carefully at the data under
Forum Statistics
and
Notable Members
for the first time, and I was surprised how steep the distribution is;
https://www.sinodefenceforum.com/members/
shows only top twenty posters, so I felt challenged to model the remaining (and missing) almost ten thousand LOL!
from now on, all blame is on me:

I collected the data (* below) (I'll mark like technical stuff with asterisks and put them further below in this post) in the morning on Monday (just for the case if somebody looked here in the future: March 13, 2016), but then didn't have time for this, anyway the hard data show the first 20 posters (out of the total of 9616) made 41% of all posts! (163268/402906=0.4052) and while guessing about the rest, I skipped what would lead to, imprecisely called (if you nitpick, I'll ignore), Pareto distribution because:
  1. I don't think there's so called Pareto tail here (which would mean several thousand posters making let's say more than one hundred posts) PLUS
  2. I would have to draw a straight line (** below) through a really small set (of just those 20 data)
and instead I tried kinda exponential distribution; before you get bored, here is the graphics:
seISC.jpg

where I glued the hard data (the first twenty points; #1 is Jeff with 21749) to the prediction which
  • starts at (an unknown member of course) #21 with 4454 (#20
    siegecrossbow
    was 4458, so this distribution is smooth, by the way you may call it SDF distribution if you want :)
  • ends at #160 with 88, while
  • leaves the rest of posts (and I did some tweaking to minimize this rest) for the remaining 9456 (9616-160) members;
this rest is the total, 402906, minus the integral (the area in black in the above graph) of 384977; 402906-384977=17929
if spread just like 17929/9456, it would mean less than two posts on average per a user with # above 160 (but this would be very unfair to #161 I guess :)

(I know this exponential wouldn't work well in top 20 region (*** below) but I obviously don't need it there)

the blue part of the above graph is related to the cumulative distribution (it's obvious the points end at 0.956, not 1.0, because 384977/402906)

now a question: I'm in no other forum, so I don't know if this is typical? I mean a tiny fraction of members making almost all of the posts

* the data:


(**)
the power law (Pareto distribution) would require (an approximate) straight line in the whole range in:
kPxD.png


(in the process I reread an awesome description in
Power Laws, Pareto Distributions and Zipf's law
Please, Log in or Register to view URLs content!

where you can see what billionaires share with craters on the Moon :)


(***)
as I said, this exponential wouldn't work well in like buildup region:
69IdN.png

where it would account for only 79% (128978 instead of 163268)


am I obsessed with numbers? LOL

Yes, so you should have a little more compassion on the rest of us with our own obsessions??? right??? F-22, Aircraft Carriers, Battleships, PAK-FA, Chinese Engines, SCS--lots of fun here? Chinese Aviation?? Submarines, Destroyers, LCS, Zummies, UK Carriers, am I getting close anybody, yes its a lot of fun, and thanks to all of you who share so much, even your family lives, and for some of us, SDF is our family??

So our little world is indeed small, wonder if any intelligence services pay any attention to our meanderings, prolly not right, LOL.

I've wondered about some of our more shall we say "provocative" members??? LOL

and yes its still a lot of fun most days!

it does sadden me when our friends kind of "fall off the map?", and on occasion I've been saddened by our "exiles", like our old friend with the Russian Fighter nameand nine lives, LOL
 
after:
...

When I first got notification about this in my email, i thought this must be some spammer... lol!!!

...
LOLOL I have one more story to tell in the pub

now
WebMaster
we could have some more fun if you ran top two-hundred, as there was like assumption behind picking the model Yesterday at 10:50 AM
... while guessing about the rest, ...
which I didn't say at first:
I think I know about one hundred members (in the virtual world, of course, and "know" sometimes means I keep that member on Ignore List LOL!), so my setup was to bring the distribution to "almost zero" at around 150 (because there're many topics here which I don't follow at all, so I thought there would be members who post a lot about those topics and I haven't heard of those members)
BUT this assumption may be completely wrong! as there actually may be the tail
... so called Pareto tail here (which would mean several thousand posters making let's say more than one hundred posts) ...
and that's something what would become apparent from the first 200 data points here

if it amounted to coupla clicks and copy-pasting a row of numbers, you could send it to me (don't worry, I wouldn't then ask for 400, 800 ... :) by now I have the alternative model ready), but as I said, it's all meant to be just fun
 
Top