SDF Distribution

Discussion in 'Feedback & Suggestions' started by Jura, Mar 17, 2017.

  1. Jura
    Offline

    Jura Senior Member

    Joined:
    Jun 23, 2013
    Messages:
    9,031
    Likes Received:
    12,726
    first of all, it's meant to be fun :)

    only last weekend (after more than three and a half years) I looked carefully at the data under
    Forum Statistics
    and
    Notable Members
    for the first time, and I was surprised how steep the distribution is;
    https://www.sinodefenceforum.com/members/
    shows only top twenty posters, so I felt challenged to model the remaining (and missing) almost ten thousand LOL!
    from now on, all blame is on me:

    I collected the data (* below) (I'll mark like technical stuff with asterisks and put them further below in this post) in the morning on Monday (just for the case if somebody looked here in the future: March 13, 2016), but then didn't have time for this, anyway the hard data show the first 20 posters (out of the total of 9616) made 41% of all posts! (163268/402906=0.4052) and while guessing about the rest, I skipped what would lead to, imprecisely called (if you nitpick, I'll ignore), Pareto distribution because:
    1. I don't think there's so called Pareto tail here (which would mean several thousand posters making let's say more than one hundred posts) PLUS
    2. I would have to draw a straight line (** below) through a really small set (of just those 20 data)
    and instead I tried kinda exponential distribution; before you get bored, here is the graphics:
    [​IMG]

    where I glued the hard data (the first twenty points; #1 is Jeff with 21749) to the prediction which
    • starts at (an unknown member of course) #21 with 4454 (#20
      siegecrossbow
      was 4458, so this distribution is smooth, by the way you may call it SDF distribution if you want :)
    • ends at #160 with 88, while
    • leaves the rest of posts (and I did some tweaking to minimize this rest) for the remaining 9456 (9616-160) members;
    this rest is the total, 402906, minus the integral (the area in black in the above graph) of 384977; 402906-384977=17929
    if spread just like 17929/9456, it would mean less than two posts on average per a user with # above 160 (but this would be very unfair to #161 I guess :)

    (I know this exponential wouldn't work well in top 20 region (*** below) but I obviously don't need it there)

    the blue part of the above graph is related to the cumulative distribution (it's obvious the points end at 0.956, not 1.0, because 384977/402906)

    now a question: I'm in no other forum, so I don't know if this is typical? I mean a tiny fraction of members making almost all of the posts

    * the data:
    (**)
    the power law (Pareto distribution) would require (an approximate) straight line in the whole range in:
    [​IMG]

    (in the process I reread an awesome description in
    Power Laws, Pareto Distributions and Zipf's law http://www.economics-ejournal.org/economics/journalarticles/2011-20/references/Newman05
    where you can see what billionaires share with craters on the Moon :)


    (***)
    as I said, this exponential wouldn't work well in like buildup region:
    [​IMG]
    where it would account for only 79% (128978 instead of 163268)


    am I obsessed with numbers? LOL
     
    Air Force Brat and Miragedriver like this.
  2. subotai1
    Offline

    subotai1 New Member
    Registered Member

    Joined:
    Jul 2, 2006
    Messages:
    54
    Likes Received:
    89
    Nice work Jura. What you have, and probably know, is what they call "the long tail" and its very common and seen many places. 20% of the people contribute 80% of the content or do 80% of something else. The rest of the work takes thousands.

    So if you are interested in dumping this in to a repository and pointing R and some tensorflow at it, we can have some more fun with the numbers...
     
    Air Force Brat and Miragedriver like this.
  3. WebMaster
    Offline

    WebMaster The Troll Hunter
    Staff Member Administrator

    Joined:
    Aug 25, 2005
    Messages:
    844
    Likes Received:
    122
    Nice work... takes a while to wrap your head around it though. :)

    When I first got notification about this in my email, i thought this must be some spammer... lol!!!

    I see the retired folks right up there and then other 18% following with major contribution while rest of the 80% just follow along slow and steady.
     
  4. Air Force Brat
    Offline

    Air Force Brat Senior Member

    Joined:
    Dec 7, 2011
    Messages:
    6,005
    Likes Received:
    6,569
    Yes, so you should have a little more compassion on the rest of us with our own obsessions??? right??? F-22, Aircraft Carriers, Battleships, PAK-FA, Chinese Engines, SCS--lots of fun here? Chinese Aviation?? Submarines, Destroyers, LCS, Zummies, UK Carriers, am I getting close anybody, yes its a lot of fun, and thanks to all of you who share so much, even your family lives, and for some of us, SDF is our family??

    So our little world is indeed small, wonder if any intelligence services pay any attention to our meanderings, prolly not right, LOL.

    I've wondered about some of our more shall we say "provocative" members??? LOL

    and yes its still a lot of fun most days!

    it does sadden me when our friends kind of "fall off the map?", and on occasion I've been saddened by our "exiles", like our old friend with the Russian Fighter nameand nine lives, LOL
     
    Jura and KIENCHIN like this.
  5. Jura
    Offline

    Jura Senior Member

    Joined:
    Jun 23, 2013
    Messages:
    9,031
    Likes Received:
    12,726
    after:
    LOLOL I have one more story to tell in the pub

    now
    WebMaster
    we could have some more fun if you ran top two-hundred, as there was like assumption behind picking the model Yesterday at 10:50 AM
    which I didn't say at first:
    I think I know about one hundred members (in the virtual world, of course, and "know" sometimes means I keep that member on Ignore List LOL!), so my setup was to bring the distribution to "almost zero" at around 150 (because there're many topics here which I don't follow at all, so I thought there would be members who post a lot about those topics and I haven't heard of those members)
    BUT this assumption may be completely wrong! as there actually may be the tail
    and that's something what would become apparent from the first 200 data points here

    if it amounted to coupla clicks and copy-pasting a row of numbers, you could send it to me (don't worry, I wouldn't then ask for 400, 800 ... :) by now I have the alternative model ready), but as I said, it's all meant to be just fun
     
    Air Force Brat likes this.
  6. Jura
    Offline

    Jura Senior Member

    Joined:
    Jun 23, 2013
    Messages:
    9,031
    Likes Received:
    12,726
    people come and go, Brother
     
    Air Force Brat likes this.

Share This Page