In 2010 there were 1200 Exabytes in existence, an Exabyte is equivalent to one billion gigabytes. There will be 40 Zettabytes by 2020, according to the International Data Corporation (IDC). That makes 40,000 Exabytes, or 40 trillion Gigabytes.
Igor Markov, a professor at Michigan EECS has named, among other things, these „Hot Topics“ for the year 2015 in the area of Big Data:
1. Big Data-Apps, Algorithmen und Architekturen
These include data mining, machine learning and hardware architectures that produce more data than we think is possible today.
2. Artificial Intelligence and Robotics
The human imagination, in terms of data mass, is gradually reaching its limits. It is becoming time to make machines more efficient. Previously, machines were built for a specific task. This is going to have to change immediately, because we need efficient algorithms, statistical models and new computer capacity.
3. Bio-computer science and use of computer science in biomedicine, medicine and health-care engineering
There is a big gap between what we know about the human brain today, and the functional possibilities of the living brain. Closing this gap is one of the greatest challenges of modern science. DNA and genetic analyses are now computer-based; and biomedical tools that allow, for example, microprocessors can release lifesaving substances into the body, are becoming part of our everyday lives to some degree. The amount of data that can be drawn from these processes lead to new insights and can improve medical treatment many times over.
The Cologne-based company ArangoDB is playing in the top league when it comes to NoSQL. And ParStream (also from Cologne), who deals with the data processing of IoT applications, was purchased by CISCO in November.
More and more companies are looking for a Data Scientist and want to use their data more meaningfully. The search for qualified people in this area is becoming increasingly difficult, because the modern Data Scientist must be able to take unstructured data, extract relevant information in a structured way, and create from this new structured data (interactive) visualisations, reports and ad hoc analyses, then preferably also automate and document the whole process. To do this, a Data Scientist needs knowledge of scripting languages, visualization frameworks, web technology and databases (SQL, NoSQL, Big Data), and must be the be familiar with standard developer tools such as bash, Linux, git, docker, regex, etc.. In short, an expert is needed who has a complete overview of the subject data, from data collection to the final product, i.e. they have a concrete recommendation for data-based action and can keep this in mind at all times. To do this, a Data Scientist must program better than a Statistician and know have mastered statistics more than a Programmer. Ideally, the Data Scientist has already begun work before the data has even been collected.
It remains exciting to see which direction German companies are developing. It is precisely issues such as data protection and legal security, which play a particularly important role in Germany, will probably (unfortunately) put a spanner in the works of some projects.
Increasingly we want to support projects in the field of data science, and offer Data Scientists a platform for exchange. Therefore, we have launched the Data Science Meetup Cologne. All information about the meetup can be found here.
On June 9, our third Big Data conference will be held with a focus on Data Science.
The program and the tickets are available here.