Here is a PDF version of my slides that I presented at the Data Science DC event:
DSDC Ensemble Learing
Author Archives: A Data Head
Hiring a Data Scientist, Building a Data Science Team and Getting Hired as Data Scientist
Two great articles came out about science of Data Science employment. The article How to hire data scientists and get hired as one, is written by Derrick Harris from gigaom.com. Harris gives six tips:
1. Know the core competencies
2. Know a little more
3. Embrace online learning
4. Learn to tell a story
5. Prepare to be tested (aka “Your pedigree means nothing”)
6. Exercise creativity
The second piece is a blog post from Hortonworks and is titled How to Build a Hadoop Data Science Team. This post describes how the skill set of a data scientist is a mix between those of a software engineer and a research scientist (see above graphic).
Scaling Big Data Mining Infrastructure at Twitter
“Scaling Big Data Mining Infrastructure at Twitter,” is a nice slide show that Alex Popescu posted on his blog, myNoSQL. The slide show is a broad overview the of the data engineering Twitter is doing to make life easier for their data scientists.
Three Questions That Can Make Data Science “Built to Last”
Data scientists should ask themselves these three questions everyday!
Big data is here to stay. There are too many opportunities to improve our lives, our businesses, and our selves using technology that is declining in cost and rising in value–at least for those who are able to harness it. From individualized medicine, to the Internet of Things, we are generating data with our every breath, and if it’s used in the best way, we are on the brink of a technology revolution that will overshadow the industrial revolution.
The question, then, is: What is the best way?
There’s been a lot of backlash against big data, which Ray Rivera discusses brilliantly in his SAP/Forbes article “Why Big Data is Getting the Bully Treatment.” Like all big changes, the move to data-driven decisions in our offices, in our schools, and in our lives, is frightening. There are hazards (both moral and technical). There’s the problem of what…
View original post 645 more words
Topology and Data Analysis
In a previous post I highlighted the Silicon Valley startup Ayasdi, a company that designs visualization software for data analysis. I came across this article, which highlights an application of Ayasdi Iris to cancer data.
Start a Blog!
One reason I started this blog was to create a portfolio of my academic work. I found this article via @KirkDBorne on Twitter, which argues why blogging is good for science students.
How to Interview a Data Scientist
Here is a slide show on hiring interviews for Data Scientists. This was presented at Strata 2013 by Daniel Tunkelang, Director of Data Science at LinkedIn.
The Evolving Role of the Data Scientist
This is a good article about the changing role of data scientists and data science in the enterprise world.
This is great checklist of topics to become proficient in for all aspiring Data Scientists. This list of skills can reasonably be attained within a master’s degree program in Statistics, Computer Science or Operations Research.
A while back James Kobielus wrote the article, Data Scientist: Consider the Curriculum. It contains one of the best descriptions of a data science curriculum I have seen. Also the article includes a list of algorithms/modeling techniques that should be known by a data scientist. Below is the list from the article.
- linear algebra
- basic statistics
- linear and logistic regression
- data mining
- predictive modeling
- cluster analysis
- association rules
- market basket analysis
- decision trees
- time-series analysis
- forecasting
- machine learning
- Bayesian and Monte Carlo Statistics
- matrix operations
- sampling
- text analytics
- summarization
- classification
- primary components analysis
- experimental design
- unsupervised learning
- constrained optimization
The list almost looks overwhelming.
Do you think anything is missing from the list?
Big Data at Facebook
Facebook has long been a leading force in the development of big data. The recent release of Graph Search is seen as a move by the company to flex their big data muscles and put their engineering bona fides to the test.
A couple of good articles came out this week discussing the engineering challenges Facebook is taking on with this project. Zach Miners, at IT World, explains the work the new tool does searching large graphs with this article. Harpreet, at Tools Journal, wrote a good piece explaining how Natural Language Processing is used by Graph Search.
One last article I would like to highlight is a glossary of big data at Facebook that was put out by Wade Roush at Xconomy.