DATA MANAGEMENT AND VISUALIZATION

Scheda dell'insegnamento

Anno accademico di regolamento: 
2018/2019
Anno di corso: 
1
Anno accademico di erogazione: 
2018/2019
Tipo di attività: 
Obbligatorio
Lingua: 
Inglese
Crediti: 
12
Ciclo: 
Primo Semestre
Ore di attivita' didattica: 
96
Prerequisiti: 

knowledge of relational model

********

none

Moduli

Metodi di valutazione

Tipo di esame: 
Orale
Modalita' di verifica dell'apprendimento: 

written exam related to the theoretical question and a project work.

The project work should be realized by at maximum 3 students and must satisfy at least 2 of 3 of the following requirements

1) the amount of data globally managed must be more than 2 gigabyte

2) two or more datasets with different data format and model must be integrated

3) data must be collected or analysied in real time

A discussion of the project work will conclude the evaluation

Valutazione: 
Voto Finale

Obiettivi formativi

At the end of the module students will be able to select, design and query a database (relational or not) according to their application needs

Students will be able to use a NoSql database management system to acquire, memorize and query semi structured data

*******

At the end of the course students will have acquired skills in analysis, development and evaluation of the quality of complex and interactive infographics.

Contenuti

data type

nosql models and architecture

big data architecture

big data engine

data lifecycle

******

The course covers the methods, techniques and tools of data visualization and visual design by which to design, implement and evaluate systems that enable the interactive analysis of data and the flexible optimization of reporting (both in an organizational domain and in data journalism). To this aim, in this course strategies will be presented for the visualization of Web data through infographics and dashboards that are both informative and understandable, and that could be implemented without advanced programming skills using various instruments that range from the most common commercial software platforms (e.g., tableau) to the several open source packages accessible on the Web (JavaScript, HTML5, etc.). An important component of the course will cover the iterative design and then the acquisition of methods and techniques to assess the quality of these infographics and their concrete application to the continuous improvement of data visualization systems. In the laboratory hours, the students will also acquire the skills necessary to carry out a concrete application project of realistic complexity, which regards the production of a Web report with graphics and animated and interactive charts on topics of common interest and public utility.

Programma esteso

0. Data Type

1. NoSQL models

1.1. Cap theorem

1.2. Document based system

1.3. Graph db

1.4. key value and columnar models

2. Data architecture

2.1. Replication

2.2. Fragmentation

3. Architectures for big data analysis

3.1. Map Reduce

3.2. Main components (Hive, Spark, Flink, Impala..)

4. Data quality

5 Data integration

6. Data life cycle

6.1. Acquisition

6.2. Storage

6.3. Integration

**********

- Introduction to Visualization.
- Data Transformation into sources of knowledge through visual representation.

- Requirements and heuristics for high-quality visualizations.

- Charts and standard views: relevance and appropriateness.

- Advanced and innovative tools for data visualization and advanced quantitative analysis.

- The evaluation of the quality of visualizations and infographics.

o Qualitative assessment: expert and heuristic;

o Quantitative assessment: user tasks; inferential statistical techniques.

o Validated psychometric questionnaires and their analysis and understanding.

- Workshops in which students will acquired practical skills to:

o extract unstructured data from web (import.io, kimono, etc.)

o manage and manipulate data in tabular format (google spreadsheet, excel, etc.)

o explore and present static data (RAWGraphs, Gephi, illustrator, etc.)

o explore and build interactive data visualizations (Tableau Public, Carto)

o design a "data-driven" narrative in a data journalism context.

Bibliografia consigliata

G. Harrison Next Generation Databases, Apress, 2015

*******

Yau, N. (2011). Visualize this: the FlowingData guide to design, visualization, and statistics. John Wiley & Sons.
Scientific articles and class pack provided by the lecturers.

Metodi didattici

Lectures and exercise in room and on PC

******

Lectures with the support of slideware, discussion of practical cases through the forum, discussion of practical home-work projects.