DATA ANALYSIS

Scienze Ambientali DATA ANALYSIS

0512700005
DEPARTMENT OF CHEMISTRY AND BIOLOGY "ADOLFO ZAMBELLI"
EQF6
ENVIRONMENTAL SCIENCES
2022/2023

OBBLIGATORIO
YEAR OF COURSE 2
YEAR OF DIDACTIC SYSTEM 2016
SPRING SEMESTER
CFUHOURSACTIVITY
432LESSONS
224EXERCISES
Objectives
THE AIM IS TO ILLUSTRATE THE PRINCIPLES OF STATISTICAL REASONING, SHOW HOW TO ORGANIZE DATA, PRODUCE GRAPHICS, PERFORM DESCRIPTIVE AND EXPLORATORY DATA ANALYSIS, UNDERSTAND THE USE OF STATISTICAL MODELS, COMMUNICATE AND DISCUSS THE RESULTS OF ANALYSIS OF ENVIRONMENTAL DATA. OVERALL, THE COURSE AIMS TO DEEPEN BOTH METHODOLOGICAL ASPECTS AND APPLICATIONS. PRACTICAL ASPECTS WILL BE CARRIED OUT THROUGH THE USE OF STATISTICAL LANGUAGE R. THEREFORE, STUDENTS ARE EXPECTED TO REACH : I) GOOD KNOWLEDGE AND UNDERSTANDING OF THE KEY CONCEPTS OF STATISTICS AND GOOD USAGE OF THE R STATISTICAL SOFTWARE , II ) APPLIED KNOWLEDGE AND PRACTICAL USAGE OF STATISTICAL CONCEPTS AND METHODS TO THE ANALYSIS OF ENVIRONMENTAL DATA . III ) CRITICAL JUDGMENT AND METHODOLOGICAL RIGOR ABOUT THE CHOICE OF THE APPROPRIATE STATISTICAL METHODS AND GOOD KNOWLEDGE ABOUT THE ASSUMPTIONS UNDERLYING THE METHODS USED. IV ) COMMUNICATION SKILLS TO PRESENT THE RESULTS OBTAINED IN A RIGOROUS, PROFESSIONAL AND REPRODUCIBLE WAY.V
Prerequisites
IT IS REQUIRED THE KNOWLEDGE OF THE GENERAL MATHEMATICAL TOOLS SUCH AS LINEAR ALGEBRA (MATRICES, VECTORS, AND RELATED OPERATIONS; EIGENVALUES AND AUTOVETTORI). MOREOVER IT IS VERY USEFUL TO BE CONFIDENT WITH THE MATHEMATICAL FORMALISMS SUCH AS READING FORMULAS AND EQUATIONS AT THE LEVEL OF A GENERAL MATHEMATICAL COURSE AS UNDERGRADUATE. IT IS ALSO NECESSARY TO HAVE BASIC KNOWLEDGE OF COMPUTER USE FOR THE PRACTICAL DATA ANALYSIS. IT IS NOT REQUIRED THE KNOWLEDGE OF THE PROGRAMMING LANGUAGE R.
Contents
THE COURSE IS ORGANIZED IN A THEORETICAL AND METHODOLOGICAL PART (32 HOURS) AND A TECHNICAL-PRACTICAL PART (24 HOURS) FOR A TOTAL OF 56 HOURS AND 6 CFU.

THE MAIN TOPICS FOR THE THEORETICAL AND METHODOLOGICAL PART ARE:
-STATISTIC IN ENVIRONMENTAL SCIENCE
-DESIGN OF EXEPERIMENTS AND SAMPLING
-CLASSIFICATION OF DATA AND MEASUREMENT SCALES
-THE USE OF CLASSIFICATION TABLES
-GRAPHIC REPRESENTATIONS OF UNIVARIATE DISTRIBUTIONS
-MEAN, MODE, MEDIAN
-DATA DISPERSION AND VARIABILITY
-SYMMETRY AND CURTOSIS
- DISTRIBUTIONS AND DENSITIES: MATEMATHICAL ASPECTS FROM PROBABILITY THEORY.
-COMBINATORIAL CALCULATIONS
- DISCTRE AND CONTINUOS DISTRIBUTIONS OF GENERAL INTEREST.
-CHI-SQUARE, STUDENT AND FISHER DISTRIBUTIONS
-COMPARISON OF OBSERVED AND THEORETYCAL DISTRIBUTIONS
-VERIFICATION OF THE HYPOTHESIS
-INFERENCE ON A GROUP OF MEANS BY T-TEST
-NON PARAMETRIC TESTS
-ANOVA ANALYSIS
- FROM THE STATISTICAL PROBABILITY: ESTIMATORS, CORRELATION MEASURES, MEASURES OF ASSOCIATION AND ELEMENTS OF DESCRIPTIVE STATISTICS.
- CONCEPTS OF STATISTICAL INDEPENDENCE/STATISTICAL DEPENDENCE, CORRELATION, ASSOCIATION, CAUSALITY.
- MULTIPLE LINEAR REGRESSION.
- STEPWISE REGRESSION AND MODEL SELECTION.
- MEASURES DISTANCE, SIMILARITY AND DISSIMILARITY: MATHEMATICAL PROPERTIES AND EXAMPLES.
- TRANSFORMATIONS OF RANDOM VARIABLES.
- INTRODUCTION TO CLUSTERING,
- HIERARCHICAL CLUSTERING: IDEAS, PRINCIPLES AND BASIC ALGORITHMS
- PARTITIONAL CLUSTERING: IDEAS, PRINCIPLES AND BASIC ALGORITHMS


THE MAIN TOPICS FOR THE TECHNICAL AND PRACTICAL PART ARE
- INTRODUCTION TO THE R PROGRAMMING ENVIRONMENT.
- R AND R PACKAGES, INSTALLATION AND COMMAND LINES
- R STUDIO
- DATA AND DATA STRUCTURES IN R
- READING AND WRITING FILES IN R
- USE OF GRAPHICAL FUNCTIONS IN R
- CONTROL STRUCTURES IN R
- FUNCTIONS AND GRAPHICS FOR DESCRIPTIVE STATISTICS IN R
- PROBABILITY DISTRIBUTIONS IN R
- SIMPLE LINEAR REGRESSION IN R
- MULTIPLE LINEAR REGRESSION IN R
Teaching Methods
THE COURSE CONSISTS OF 56 HOURS OF THEORETICAL/METHODOLOGICAL LESSONS AND PRACTICAL EXERCISES OR LABORATORY (6 CFU). IN PARTICULAR, THERE WILL BE 32 HOURS OF TEACHING ON THEORETICAL-METHODOLOGICAL ASPECTS AND 24 HOURS OF LABORATORY WITH THE COMPUTER FOR TECHNICAL AND PRACTICAL APPLICATIONS.
THE COURSE IS ORGANIZED AS FOLLOWS: CLASSROOM LESSONS ON ALL COURSE TOPICS (16 LESSONS OF 2 HOURS EACH), PRACTICAL EXERCISES WITH THE COMPUTER ON ALL TOPICS OF THE COURSE (8 LESSONS OF 3 HOURS EACH). THE TECHNICAL-PRACTICAL EXERCISES WILL FOLLOW THE THEORETICAL LESSONS ON THE SAME SUBJECT.
FOR PRACTICAL EXERCISES STUDENTS WILL USE THEIR OWN COMPUTER AND INSTALL THE STATISTICAL SOFTWARE R (WHICH IS OPEN-SOURCE).
FOR PRACTICAL EXERCISES, STUDENTS CAN WORK INDIVIDUALLY AND IN PAIRS.
COURSE MATERIAL SUCH AS: SLIDES, EXAMPLES IN ,R, EXERCISES AND DATASETS
ALL COURSE MATERIAL WILL BE PROVIDED AT THE BEGINNING OF THE COURSE. STUDENTS ARE KINDLY INVITED TO READ THE CLASS MATERIAL BEFORE THE LESSON TO BETTER BENEFIT AND INTERACT DURING THE CLASS HOURS.
IT IS NOT MANDATORY TO FOLLOW THE THEORETICAL CLASSES. HOWEVER, IT IS STRONGLY RECCOMENDED TO FOLLOW THE EXERCISE LESSONS WITH THE SOFTWARE R.
Verification of learning
THE ACHIEVEMENT OF THE COURSE’S OBJECTIVES IS CERTIFIED WITH AN EXAM CONSISTING IN
A WRITTEN EXAM AND A PRATICAL ONE. THE FINAL GRADE DEPENDS FROM THE SCORES GIVEN IN EACH PART. 18/30 POINTS ARE NECESSARY TO PASS THE EXAM.
THE WRITTEN EXAM CONSISTS IN THREE QUESTIONS ON THE THEORETICAL AND METHODOLOGICAL ASPECTS CONSIDERED IN THE COURSE’S PROGRAM (5 POINTS PER QUESTION).
THE PRATICAL WORK CONSISTS IN FEW EXERCIZES USING R ON THE TOPICS EXPOSED IN THE CLASSES (THE SCORE IS UP TO 15 POINTS).
THE MINIMUM SCORE (18/30) WILL BE GIVEN TO THE STUDENTS SHOWING ONLY A LIMITED KNOWLEDGE OF THE BASIC PRINCIPLES OF DESCRIPTIVE/INFERENTIAL STATISTICS AND/OR A LIMITED KNOWLEDGE OF THE THE LANGUAGE R.
THE MAXIMUM VOTE (30/30) WILL BE GIVEN TO STUDENTS SHOWING A VERY GOOD COGNITION OF BOTH THE THEORY AND THE PRACTICE OF STATISTICS WITH A GOOD PRACTICING OF THE SOFTWARE R.
THE LAUD WILL BE GIVEN TO THE STUDENTS REACHING THE MAXIMUM SCORE OF 30/30 AND THAT DEMONSTRATE TO BE AUTONOMOUS AND CRITICAL IN THE KNOWLEDGE ACQUIRED IN THE CLASSES.
Texts
1)WALTER W. PIEGORSCH, A. JOHN BAILER “ANALYZING ENVIRONMENTAL DATA” – WILEY (2005)
2)RICHARD G. BRERETON “CHEMOMETRICS – DATA ANALYSIS FOR THE LABORATORY AND CHEMICAL PLANT” – WILEY (2003)
3)PETER DALGAARD - "INTRODUCTORY STATISTICS WITH R" - SPRINGER
4)NOTES AND SLIDES OF THE LESSONS
More Information
ALL LECTURES AND EXERCISES WILL BE ILLUSTRATED WITH EXAMPLES AND REAL CASE STUDIES OF INTEREST.
THE CLASSES WILL BE GIVEN IN ITALIAN LANGUAGE.
  BETA VERSION Data source ESSE3 [Ultima Sincronizzazione: 2023-01-23]