NYU has announced a university wide Data Science initiative and the creation of the Center for Data Science (CDS). Columbia University got a head start with The Institute for Data Science and Engineering (IDSE), but NYU is offering a MS degree with plans to offer a PhD, something Columbia’s IDSE is not doing at this time.
This news comes on the heels of Massachusetts’ efforts to make Boston a big data capital. Massachusetts has long been a leading center for academic research in data mining and big data through MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL).
There are many institutions across the US that have added Data Science degrees to existing programs. However, NYU is one of the first schools to create a center with the purpose of offering a degree, as opposed to only conducting research, in this field.
This looks like a really interesting class that covers a lot of useful tools for data science. The course blog has postings for each lecture and slides.
Python and big data have recently been in the news. Continuum Analytics just received $3M from DARPA (I hope they cash their check before the possible sequester) to develop big data capabilities for Python with projects Blaze and Bokeh. This is promising news for those of us that are not proficient in multiple programming languages. At this point, Java has been the lingua franca for most big data applications. This project won’t address all the performance issues with Python, hence the common use of Java in most development, but hopefully it’ll allow us non-polylingual programmers do to some heavy lifting without all the curly braces.
Once again GigaOm’s Derrick Harris gives us a great report in, “DARPA puts $3M into startup pushing big data in Python.”
InformationWeek Government also has a nice article, “DARPA Funds Python Big Data Effort,” by J. Nicholas Hoover.
Also check out Continuum Analytics’ blog announcement about the project and another post detailing Blaze.
These are older stories, but never the less, I wanted to post them as a reminder to myself and a heads up to others. Ayasdi has developed a data mining platform that is based on topological data analysis. The company was co-founded my Dr. Gunnar Carlsson of Standford University and is based on Carlsson’s pioneering research in the field of Applied and Computational Algebraic Topology.
Why is this interesting? Because this is a completely new approach to data analysis. A lot of the methodologies used in data mining/machine learning/data science is based on statistics and probability. The use of algebraic topology sort of comes out of left field (abstract algebra is extensively used in cryptology and computer security). Ayasdi’s approach shows they are doing some very novel and innovative research.
Enough said, below are two great articles that do a better job than I can explaining what exactly Ayasdi does.
This first article comes to us from Derrick Harris who writes for GigaOm. Check out his great article, “Has Ayasdi turned machine learning into a magic bullet?“
The last article is from the New York Time’s blog Bits. The post, “Ayasdi: A Big Data Start-Up With a Long History,” gives some background about the company.
Vincent Granville recently posted “66 job interview questions for data scientists” at Data Science Central. It actually totals 71 questions that cover a broad range of topics that would good to familiarize yourself with, learn about and master.
I came across “The Structure of Big Data” through Alex Popescu’s blog, myNoSQL. The author does a good job of giving examples and explaining,
- Structured Data
- Semi-Structured Data
- Unstructured Data
I have decided to create a blog to chronicle my experiences in graduate school and chart my journey into the field of data science.
I would like to see two things from this blog:
1) A tool to help me organize the knowledge I gain from school, work and self-study as I aspire to become a professional data scientist.
2) To help others as they pursue similar endeavors.
I hope this experiment is successful for me and a useful resource for the interested public.