More Data is not better! Well, that took long enough to glean!
With the emergence of expressions such as “big data” it has become the norm to think that more data is always better than less. Of course, counterintuitively, it should have been clear that this was not true. And now, a paper coming out of Stanford University and Harvard University is laying out the case that it is indeed not true that more data is better, or that more data can simply be harmless. When it comes to risk modeling, the crux of what the paper is about, the data doesn’t directly model the risk item under study, such as recidivism, health outcomes or related topics.
In these areas, label bias, as defined by the authors. Since the underlying data can have biases such as geography, economics or other factors, they can lead to confounding, incorrect and many times dangerous results for racial minorities, for instance, for African Americans when it comes to crime or healthcare outcomes. In such instances, removing incorrect data or proxies can actually improve outcomes.
Read the summary article here: https://hai.stanford.edu/news/how-bias-hides-kitchen-sink-approaches-data?utm_source=Stanford+HAI&utm_campaign=013d8c6d1d-hai_news_june_9_2024
The original paper is available here: https://www.science.org/doi/epdf/10.1126/sciadv.adi8411
The cover image for the post is from here: https://pixabay.com/vectors/statistic-analytic-diagram-1564428/
Reposting: Quick Review: Big Data, by Brian Clegg
Reposting from my other young site: http://bibliomaniac.me/
Big Data is now more than hype, which however wont stop from those who wish to hype things away from reality from continuing to do so.
If you want a clear and concise book on Big Data, or are like me, never able to stay away from any refresher, well this book is for you.
From obvious Big Data examples, such as the erstwhile Netflix, to some very Brit specific stuff (the author is British, as you might have guessed), the book is an easy read in just under 150 pages, and can be easily understood by a wide range of readers.
It also lays out important details, such as the purpose of analyzing Big Data, the pitfalls of rushing to it, without informing a panic-prone public (hear that, Google?) and more.
Enjoy the read!
Saturday Night Fun with Mnemonics, Steps or Gibberish Patterns…
This is most certainly an off-topic post…
After reading this post, you may gather another data point on the risks posed by single scientists in the wild. You may sympathize, or worse, empathize with me. Alternately, you may want the few seconds of your life back, but then again, if you are reading this blog, you might be posed the question, “what life?”. You may now consider yourself well and fully warned.
So, in keeping with my recent resolution to engage in some basic ‘refresh and research’ for a certain period of time every day, I found myself watching a video on thermal analysis presented by one of the experts at DS SolidWorks. Watching the video, I was encountered by a rather old friend, the Stefan-Boltzmann constant. Enigmatic, just like dozens of its other storied friends, such as Avagadro, Planck et al., here is another constant that you don’t usually bother to understand, or analyze and assume that the famous people who have rendered their name to such wild numbers must have done so after they themselves spent Saturday Nights agonizing over how to make Engineering Problem Solving quite a tedious, yet terrifying experience.
Yet, I was staring at the screen, and I realized something very neat.
Back when we were studying Physics in India, remembering these constants, with their numbers and exponential values, could make the difference between going to a University of repute, or one of not so much repute, or failing both, finding something else (which would at least make you richer and happier than any engineer out there). Thus, you had to really excel at rote memory or be mnemonic geek (mind you, in addition to other forms of geek savant behavior).
If I were to do that today, I would have had fun with the (approximation of) Stefan-Boltzmann. It is neat if you think about it, like the Steps Song, also titled 5, 6, 7, 8. If you have never heard it, here is the video (of a tolerable song with some incredibly bad choreography – the kind that will chase aliens away from this planet, should they watch it):
And now, it will be nearly impossible for you, or me, to forget the simplified Stefan-Boltzmann Constant. You’re welcome. Or, not. Oh, and if you really want to know what I do on Saturday Nights:
http://blogs.solidworks.com/tech/2017/03/solidworks-simulation-step-series-thermal-analysis.html