Tuesday, January 13, 2015

The First Rule of Professionalism: Flexibility!

As those following and participating in this blog will be aware, I have spent several weeks preparing for this series of courses. Every thing possible to anticipate was anticipated. Everythng needing preparation was preparted. The path was set, and begun. The plan was intiated, the courses begun. Everything was perfect...almost.

One day before the first graded Quiz in the R Programming Course was released (as it turned out), there was a significant hardware problem which shut down the entire project. The primary computer--you know, the one with all the software on it--experienced a hard crash.

The culprit, as it turned out, was a failed power supply. Said power supply had to be ordered, and received. The hardware tech had to appropriate sufficient schedule time to install, and perform overall maintenance on the entire system and affilitaed peripherals. Backups had to be secured.

So while I could see videos (another, much less equipped computer), and practice a bit with RStudio/R, no work could actually be done. There was no webcam, for instance; no screen capturing softward, no audio editing software (for other commitments). It was a cold shutdown that did just that. Completely.

As of about an hour ago, the primary computer was restarted after successful installation and maintenance. Today, I will work to get caught up and on track. The first quiz is going to be a casualty. I am very unhappy about that.

Flexibility must be designed into each project that matters to you. Patience is a kindred cousin, and both are very often at loggerheads. Such is the case now. Perfectionism has been a thorn in my side for most of my life. I have used it to bolster my success, and also to be a primary component in failure. "Never let perfect stifle good enough!" is a theorem I have learned later in my life.

With this project, however, perfection is the goal. To see it flame out even before the serious work begins is a personal sadness for me, but I stand convinced of the purpose and nature of this journey. It will be alright.

Now, to the work. Has flexibility (or lach thereof) figured significantly in YOUR plans, or your work? Let's talk about it.

Thursday, January 8, 2015

So A Friend At Slashdot Asked Me...

                                                        Wikipedia Commons


if there were any truths to the rumor that I was going to school...again.

I replied that I was. He wanted to know some of the details. He has known me for a long time. Perhaps he wanted to confirm some (potentially nefarious) future humor seeding bed for his own edification and/or exploitation. (Yes, that kind of friend!)

So, instead of simply feeding his smirk, I decided to post an article for submission over at Slashdot about what my educationally-motivated self is "up to these days". I told him via the article.

If you want, you can check it out as well. Comments are turned on, but you have to follow their (Slashdot's) commenting rules.  

NB: Please do not comment as "Anonymous Coward". Nobody will see those comments on my blog. :)

"Learning All Over Again...For The First Time!"

I've encountered a few challenges in the short time since beginning this study course. Several of them have to do with "compartmentalizing" comments for, among and/or between the two courses in which I am currently enrolled.

1. I posted an introductory post in one course forum, but not in the other. I do not wish to simply duplicate any post, ever. The writer inside me doesn't want to, and the internal editor inside me would never, ever permit such creative heresy. This is one example that of several simple challenges multiple discussion fora can present.

2. TA's in one course may not (in fact, seldom are) TA's in other, or multiple courses. This seems to be somehow related to the fact of numbers. When there are 10,000 or more students in a particular course, common sense would indicate the high probability of having more TA's managing/moderating that course's Discussions. Similarly, a smaller number of students in a course might at least potentiate having fewer TAs in that course. Would the same number (or specific population) of TAs be required for a course with 300 students as one whose population exceeds 10,000 students?

"Correlation does not..."

I believed this to be the motto of the Bloomberg School. (It isn't.) Had this been the case, I would have been proud to be able to tell anyone that I knew where that particular terminology came from!

1. "Protecting public health, saving lives...millions at a time!" is actually the written mantra of the School. Good thing, this. It is a profound mantra to live out. 

2. That "other" mantra has been around a very long time, indeed and even preceeds the experience of those of us who actually knew, and used DARPANet. In fact, the debates (celebrated and otherwise) have been fuming for centuries! Hundreds of years prior to the revelation of the Pearson coefficients, philosophers had been working at cracking this nut over drinks.-! (Hemlock?)

As surprising as it may be, the statement and the debate encircling it, that "Correlation does not imply causation!" is rampant. Used often to dismiss not only arguments, but dialogue, this particular set of words has seen a dynamic resurgence in the past two decades. First scribed in a memorable work in 1870, the words were not only a "new" consideration, but rather a summing up of a centuries-old sticking point in literary argument. Understanding why one of the premiere fathers of the science of statistical methods wrote them in the first place adds some credence to the argument, but is by no means the end of it. Trust me on this one.

"Now conditions are reversed. We're the bullies over nature and less afraid of poison berries. When we make a claim about causation, it's not so we can hide out from the world but so we can intervene in it. A false positive means approving drugs that have no effect, or imposing regulations that make no difference, or wasting money in schemes to limit unemployment. As science grows more powerful and government more technocratic, the stakes of correlation—of counterfeit relationships and bogus findings—grow ever larger. The false positive is now more onerous than it's ever been. And all we have to fight it is a catchphrase." (Slate Magazine, By Daniel Engber, 10/12/12)

Karl Pearson (whose photo graces this post, taken in 1912) had a science-shifting effect on the world. He also gave us one of the most intense debates of the 19th, 20th and (now, it would seem) 21st centuries. Hard and Social Sciences use, or misuse this consideration as regularly as you might pull a Tissue from a box. It has it's proper place in our study, and should be remembered as a fundamental principle--where properly applied.


I'm not going to tie my entire understanding of the Data Science field to this, or any other approbation. I'm not going to use it to stifle, or stop legitimate discourse, either. Nor should anyone else, in my view--especially in such a high-tiered learning environment.


I do understand that a few decades of having to fend off countless interactions where having remembered this caution would have saved time, dollars and potentially many lives can have an effect on you. I can name a few of those, myself. So can you, if you are being honest.

I'm not here to argue, necessarily. I am here to learn, to soak up, absorb, and master elements of an amazing field of study and a terrific industry. I'll concentrate on that, for now. Let those with more "skin in the game" have a field day.

I'll concentrate on getting Discussion Forum posts right. And learning new tools in my Data Scientist's Toolbox. And R Programming. And about discourse with an entirely new subset of the planetary population.

Let's DO This!

Bud

PS: I am on Twitter @DS_Student, and you can email me at DS_Student@outlook.com


Tuesday, January 6, 2015

Day 1



Day the first.

When the first day of class is two days long, one should spend some quality personal time considering the reality before one, yes?

When one is being taught by the collective brain trust of the Biostatistics Department of the School of Public Health at Johns Hopkins University, it would seem as if there were some preliminary conclusions one could make prior to dipping one's foot into the pool. Yes?

I have reached two conclusions on the first day:

1. If it seems simple, re-evaluate everything.

2. If you must accelerate the speed and complexity of your learning, do so as quickly as possible.

Two courses in the inaugural session. (See #1)

I will probably survive, intact, both courses. Probably. Four weeks. Two projects. Both presented on the first day of class. Both due on the same day. Both begun yesterday.

I find the absence of video presence of the Instructors strangely unfamiliar, and a little disconcerting. I am reminded of an historical "classic": audio files narrating a PowerPoint Presentation.

When the first "recommendation" of your Instructor is to "review" a title in preparation for the quiz due this Friday, you would probably take a casual gander at the recommended reading. Right? Well, what do you do when the "reading" is 574 pages? (See #1 ) And this is the first of perhaps 100 links in the first week of the course. There are some critical observations which might be made here, anon. :)

The Course is The Data Scientist's Toolbox, an introductory course to the field of study. R, RStudio, Gist, Git, Github, Swirl, NotePad, MaxTek.... all of which (and much more) you must initiate with new accounts, provide proof screenshots, and immediately use within this Introductory Course can make your first day two days long. I do love a challenge!

The instructor is great! There is not one wasted word in his lectures.  (See #1, #2)

Dr. Jeff Leek takes us on a journey of exploration, innovation, and implementation, presuming that we must surely HAVE a toolbox into which we will automatically place each of these (and several other) data science tools. We will, he further seems to allege, master them to his level of expectation--by Friday.

I'm not being fair here, but I am trying to make a specific point. MOOC's in general have a vastly underrated (and, I have found unwarranted) expectation of "casual learning".

This specialization/certification ain't that. At all. But the brilliance of the Instructors does what brilliant instructors have always done. A level of expectation that honors learning, honors the student, and honors the work is literally painted over the student from the first moment, and Leek does an alarmingly disarming job of it. By the end of the first week of lectures, the student has been elevated into the rarefied atmosphere of what is possible! The instruction is disarming...until you realize the expectation. It is a whiplash/whipsaw effect that can behead the unsuspecting student.

Enter Dr. Roger Peng, and the R Programming Course. A word in my own defense here, if you please. (Or if you don't, actually. :) My blog, my rules!)

As Dr. Leek says within the first five minutes of the first lecture, "We here at JHU.edu place emphasis on the "science" in the Data Science specialization. We will focus on that emphasis throughout this specialization. Be ready for it."

What you should hear in those statements is something along the lines of "This is your only warning! Let's do this." And, off like a shot.

This reality is not only what gives the JHU-led specialization a much higher level of industry-wide regard, admiration, entre and respect. This philosophy also makes an already intense course track more intense by an order of magnitude. Students are expected to keep up, like a first year med student is expected to keep up. Precisely what I was hoping for, without any expectation of realizing it.

Where Leek very gently takes his students by the hand into some very, very tall grass and lets them go cavorting on their own, Peng races through the course lectures of the first week with the complete authority of a world class leader in the subject--with the full expectation that you will not merely follow, but keep up and perhaps (if you are worthy) sprint ahead. He is disarmingly passionate about his subject. He is unbelievably qualified to lead this Course and as shy and gentle as a lamb in the process.

By week two, you will either be at speed, or you most likely will no longer be a part of the specialization. I'll be there. However, I have written down #1, and #2. I have them posted on the front of my course writing materials. I will see them often.

And, I will probably NEVER have the opportunity to personally thank these monsters of dedicated education. It is just breath taking, the freedom I feel to excel here. Dedication to superior excellence in teaching should always be responded to with superior excellence in learning. This course is flooded with both, and I can't wait!

I cannot possibly do it by myself, or alone.

Isn't that just awesome??

The only thing I can do is sprint, with long strides. That, I will do.

Saturday, January 3, 2015

The Preparations Continue


Stanford University has a refresher course in Statistics beginning on January 15th, 2015. 

That information came into my email inbox today. Yes, I am enrolled. As this course begins one week after the two courses I will be taking to begin my Data Science program with Coursera, I have been contemplating. A lot.

As I see it, there are two (and only two) distinct courses of action I can follow in the short term:

1. Java programming and Hadoop for the computer programming aspect of this discipline. Java is in my distant past. I have not kept up with trends and updates in quite a while. Hadoop and I just met. 

2. R (and RStudio, and RCommander, and...) and Python (but 2.0 or 3.0?) alternatively. Both are known to me, but seem to be waning in the "new age" of data science.  Given a discipline only about 4 years old, I do have difficulty using that phrase. There it is, anyway.

I am inclined to the second option for expedience. I'm not certain that is an adequate determinant. I feel as old (after another birth day on December 31st) as R sounds from its press. (There is a decidedly animalistic competition going on right now in this area of software development.) Python is going to require some brushing up. 

Either way, additional brain time will be required, even after the courses begin. After three days of scrubbing nothing less than the internet itself for information, courses, videos, articles, and individuals selling their wares like a very bad used car salesman, I am strongly inclined to follow the course outline with R, while pondering the choice made to align the course to it. Yet, now would really be an awesome time to display some real confidence in the course, and the instructors who have designed (and will be teaching) it. Yes?

I want to be very good within the discipline, but I also want to be relevant and ready for creating an income as a result of it. There is and must be no doubt that this specialty will require ongoing (non-stop) learning--one of it's strongest attractions for me. I learned several degrees ago that the best one can hope for is a place in the starting grid. I want the best possible place in the starting grid. I'll do the rest. 



So, this weekend will be spent looking at things like Bioconductor, and Java, and... everything else I can research effectively. Class starts Monday. I intend to be there. 

Recent predictions indicate that, in the near future, data analytics will be available"in a box". (Buck Woody, MicroSoft) Given my penchant for breaking things so as to learn how they work, this is not a totally bad thing either way. Tools should be mastered, not merely recognizable--or even understood.

Besides, I know that nobody thinks the way that I do. I hear that all the time! :)

Any recommendations? Suggestions? Ultimatums?

Thursday, January 1, 2015

Hello, World! (Or, Why Count Words?)

Welcome to 2015, and an AMAZING journey I am about to begin.

Because of a very special program for US military veterans, Coursera--in association with the National Education Association, The Department of Veteran Affairs, and the US Department of Education (and others),  has agreed to participate in a very special program where veterans are being permitted to enter study for ONE Verified Certification of their choice.


I have chosen to study for the Data Scientist Certification program through Coursera as a benefiiary of this special opportunity. I believe every vet should make themselves available to this incredible opportunity.


In this blog, I will post my journey to that certification. Having done my due diligence, I am appalled at what I have committed to.  Yet, as a lifelong student, I am pleased to see I have, once again chosen to follow a seeming unlikely path. The field is vast, has been voted "The Sexiest Job of 2014", and is littered with some of the most incredible talent, educators, and organizations in the world.


My Certification, for instance, will come from Coursera and their partner Johns Hopkins University. My training will be led by not one, but three of the top names in the field, and will interact with many more. I have learned that there is a deep need for data scientists today. These professionals are very difficult to find, however because of the incredible depth and breadth of knowledge in several distinct disciplines required. The beginning pay in the US is not too bad ($112-$120k for those found, and hired), but given the requirements could quickly approach minimum wage when comparing to the hours required to do the work.


Data Scientists are usually part of the leadership of a team of professionals, each with one very special area of expertise, which may require them to "step in" and carry the burden for another team member when things get "interesting".  From what I have learned, things can get interesting often, a lot.


The field will include much of my life experience, and offer me a real possibility to contribute work of not only the highest quality (personal standard), but real significance in those areas I will choose to apply my learning (long range goal). Of course, the learning will necessarily continue well past this particular journey; just another benefit of my choice of study.


The scope of the learning itself is vast, and there are areas where I must confess to be weak--at the moment. That just whets my learning appetite. But, because of the situation around this proposition, only my very best work will be acceptable to anyone, including myself. That will be a singular challenge given not only my several other endeavors, health, and general outlook on things.


I hope to learn how to find, ask, and answer those questions which consume my thinking, my writing, and my activity from day to day. I hope to become so proficient in this field as to be able to do the same for others, as well (the real goal).


It won't be pretty, or clean, or easy. Just the way I like it. The idea of "leaving it better than you found it" is a prime directive for me on this journey. While this will be a personal reflection on my part, I hope this blog will become a hub of dialogue for my classmates, instructors, and those desiring to know the realities of the study, the profession, and the work.


I hope you will share the journey with me. From a "clean screen" of "What IS A Data Scientist?" to "I AM A Data Scientist!", this blog will consider, share, ask, inquire, complain, and boast the small and large things of life when you choose this path.


Let's begin, and continue a conversation along the way, okay?


Bud