Graduate Program in Social Data Analytics (SoDA)

Details of the Penn State graduate program in Social Data Analytics, offering a dual-title Ph.D. and graduate minor.

sodawidespace.jpg

Social Data Analytics (SoDA) is the integration of social scientific, computational, informational, statistical, and visual analytics approaches to the analysis of large or complex data that arise from human interaction. The Penn State SoDA Program offers both graduate and undergraduate degrees. For information on the undergraduate program, please visit the website of the B.S. in Social Data Analytics.

The SoDA graduate program enables students from diverse graduate programs to attain and be identified with an interdisciplinary array of tools, techniques, and methodologies for social data analytics, while maintaining a close association with a home discipline.

Applying to the Graduate Program in Social Data Analytics

The SoDA graduate program, offering a dual-title Ph.D. and graduate minor, is now accepting applications. Please note that SoDA cannot independently admit students -- students must be admitted to an appropriate home PhD department at Penn State.

Students admitted to the following Penn State Ph.D. programs, but not yet having passed candidacy, may apply for the dual-title Ph.D. in Social Data Analytics: Human Development & Family Studies, Political Science, Sociology, and Statistics. We anticipate approval in 2018 of dual-title degrees with the Ph.D. programs in Geography and Information Sciences & Technology, but please contact the graduate officer of your major Ph.D. program regarding the status of that process.

Students accepted to any Penn State Ph.D. program are eligible to apply to pursue the graduate minor in Social Data Analytics.

To apply for the SoDA dual-title PhD or graduate minor program, please complete the SoDA Graduate Program Admission Application.

Requirements of the dual-title Ph.D. in Social Data Analytics

The full detailed requirements of the dual-title Ph.D. in Social Data Analytics are as laid out in the University Bulletin. These are requirements of all students in the SoDA dual-title Ph.D., regardless of major program. Your major program may impose additional requirements, which will be detailed in the Graduate Bulletin entry and graduate handbooks for that program. In addition to completion of the requirements of the major program, dual-title Ph.D. students must:

  • Complete 18 credits of prescribed coursework, including 6 credits in two SoDA seminars, and 12 credits in approved electives.
    • This does not mean 18 credits of additional coursework. Due to overlapping requirements / double-counting (some courses in major program counting as SoDA electives), this typically can be met with four well-chosen courses outside of the major program (two SoDA seminars, two electives). The specific requirements that overlap vary by major program. In any case, dual-title students should complete coursework requirements with no more than one extra semester (and in experience to date, most dual-title students complete coursework with no delay.) Further coursework details are outlined below.
  • Pass a candidacy exam (as defined by your major program) assessing candidacy for both primary program and SoDA. A member of the SoDA Graduate Faculty (who may also be on the faculty of the primary program), must be on the candidacy committee. The Graduate School requires students wishing to pursue a dual-title Ph.D to be admitted to the dual-title program prior to passing candidacy in their major program.
  • Pass a comprehensive exam (as defined by your major program) assessing mastery of the major discipline and Social Data Analytics, as well as preparation for dissertation research. A member of the SoDA Graduate Faculty (who may also be on the faculty of the primary program) must serve as chair or co-chair of the dissertation committee.
  • Successfully defend a dissertation with substantial content in Social Data Analytics.

Requirements for SoDA graduate minor

The full detailed requirements of the graduate minor in Social Data Analytics are as laid out in the University Bulletin. In addition to completion of the requirements of the major program, graduate minor students must:

  • Complete 15 or more credits of prescribed coursework, including 6 credits in two SoDA seminars, and 9 or more credits in approved electives. Further coursework details are outlined below.
  • Pass a comprehensive exam (as defined by your major program) assessing mastery of the major discipline, as well as preparation for dissertation research. A member of the SoDA Graduate Faculty (who may also be on the faculty of the primary program) must serve on the dissertation committee.The Graduate School requires students wishing to pursue a graduate minor to be admitted to the minor program prior to passing the comprehensive exam in their major program.

Coursework: Social Data Analytics core seminars

SoDA dual-title Ph.D. and graduate minor students must complete 6 credits in two core seminars:

  • SoDA 501: Approaches and Issues in Big Social Data (3 credits)
  • SoDA 502: Approaches and Issues in Social Data Analytics (3 credits)

Further information, including recommended background, can be found in the detailed course descriptions. Please note that these seminars are intended primarily to integrate interdisciplinary perspectives, not to provide training in all technical elements of social data analytics, and intended more as capstones than as introductions. It is expected that students would have the recommended background before taking these seminars. In most programs, SoDA 501 would be taken in the Spring of the student's second Ph.D. year (4th PhD semester), and SoDA 502 in the Fall of the student's third Ph.D. year (5th PhD semester).

Coursework: Social Data Analytics electives

SoDA dual-title Ph.D. students must complete 12 or more credits in courses on the list of approved electives. SoDA graduate minor students must complete 9 or more credits in courses on the list of approved electives. In both cases, these electives must collectively meet certain distribution requirements.

  • Thematic distribution requirements:
    • (A) Analytics distribution: 3 or more credits in approved courses focused on statistical learning, machine learning, data mining, or visual analytics. Courses meeting this requirement are designated (A) on the list of approved electives.
    • (Q) Quantification distribution: 6 or more credits in approved courses focused on statistical inference or quantitative social science methodology. Courses meeting this requirement are designated (Q) on the list of approved electives.
    • (C) Computational / informational distribution: 6 or more credits in approved courses focused on computation, collection, management, processing, or interaction with electronic data, especially at scale. Courses meeting this requirement are designated (C) on the list of approved electives.
    • (S) Social distribution: 6 or more credits in approved courses focused on the nature of human interaction and/or the analysis of data derived from human interaction and/or the social context, ethics, or social consequences of social data analytics. Courses meeting this requirement are designated (S) on the list of approved electives.
  • Cross-departmental distribution requirements:
    • (DC1) Departmental cluster 1: 3 or more credits in approved courses with the prefix STAT or that of a primarily behavioral or social science department or program (currently including APLNG, CAS, CRIM, CLJ, PLSC, SOC, HDFS, and DEMOG). The designation (DC1) refers to courses meeting this requirement.
    • (DC2) Departmental cluster 2: 3 or more credits in approved courses with the prefix GEOG, IST or that of a primarily computer science or engineering department (currently including CMPSC, CSE, EE, and IE). The designation (DC2) refers to courses meeting this requirement.
    • 6 or more credits in approved courses outside the primary program.
  • 3 or fewer credits in approved courses at the 400-level.

Many courses carry multiple designations. Some examples are provided below.

Examples of electives carrying (QCS) or (AQCS) designations:

  • Departmental Cluster 1:
    • APLNG 578 Computational and Statistical Methods for Corpus Analysis
    • HDFS 597 Bayesian Methods for Human Development & Family Studies (Oravecz)
    • HDFS 597 Introduction to Data Mining for Human Development & Family Studies* (Brick)
    • PLSC 551 Big Data Approaches to the Study of Political Representation (Monroe)
    • PLSC 597 Big Data and the Law (Zorn)
    • PLSC 597 Political Events Data (Schrodt)
    • PLSC 597 Robust Methods* (Honaker)
    • PLSC 597 Social Network Analysis for Political Science (Desmarais)
    • SOC 597 Methods of Social Network Analysis (Felmlee)
    • STAT 597 Statistical Privacy in Large Databases (Slavkovic)
  • Departmental Cluster 2:
    • CSE 597 Data Privacy, Learning and Statistical Analysis (Smith)
    • CSE 597 Social Network Data Analytics (Lee)
    • GEOG 560 Spatio-Temporal Movement Analysis (Andris)
    • IST 597 Principles of Artificial Intelligence (Honavar)
    • IST 597 How the Mind Works (Reitter)
    • IST 597 Visualization and Advanced Analysis of Social Networks
    • GEOG 597 Big Data & Place (MacEachren)

*These courses also satisfy the analytics distribution (A).

For a complete and current list, please consult the list of approved electives.

Examples of electives carrying (QS) designations:

  • Departmental Cluster 1:
    • CAS 563 Pairs & Pairings: Quantitative Methods for Interdependent Data
    • HDFS 530 Longitudinal Structural Equation Modeling
    • HDFS 597 Advanced Topics in Latent Class Analysis (Bray)
    • HDFS / STAT 597 Item Response Theory Models for College Testing Data (Loken)
    • HDFS 597 Applied Longitudinal Data Analysis (Ram)
    • HDFS 597 Person-Specific EMA (Molenaar)
    • PLSC 505 Time Series Analysis in Political Science
    • PLSC 597 Causal Inference (Keele)
    • PLSC 597 Measurement Theory (Fariss)
    • SOC 572 Foundations of Causal Analysis in the Social Sciences
    • SOC 575 Statistical Methods for Nonexperimental Research
    • SOC 577 Techniques of Event History Modeling
    • SOC 579 Spatial Demography
    • STAT 507 Epidemiologic Research Methods

For a complete and current list, please consult the list of approved electives.

Examples of electives carrying (QC) or (AQC) designations:

  • Departmental Cluster 1:
    • STAT 540 Statistical Computing
    • STAT 557 Data Mining I*
    • STAT 558 Data Mining II*
    • STAT 584 Machine Learning: Tools and Algorithms*
  • Departmental Cluster 2:
    • CSE 583 / EE 552 Pattern Recognition: Principles and Applications
    • CSE 584 Machine Learning: Tools and Algorithms*
    • CSE / IST / IE 561 Data-Mining Driven Design*
    • CSE 586 / EE 554 Topics in Computer Vision
    • CSE 597 Advanced Big Data Analytics* (Kifer)
    • CSE 597 Data-Mining and Analytics* (Lee)
    • CSE 597 Graph Mining (Madduri)
    • CSE 597 Machine Learning* (Kifer)
    • CSE 597 Regularity on Interdisciplinary Large Data Sets (Liu)
    • GEOG 586 Geographic Information Analysis
    • GEOG 597 Geoinformatics (Cervone)
    • IST 556 Web Analytics: Research Approaches for Online Data
    • IST 557 Data Mining I*
    • IST 558 Data Mining II*
    • IST 597 Big Data Fundamentals (Yen / Giles)
    • IST 597 Principles of Machine Learning* (Honavar)
  • No departmental cluster designation:
    • PHYS 580 Elements of Network Science and Its Applications.

*These courses also satisfy the analytics distribution (A).

For a complete and current list, please consult the list of approved electives.

Examples of electives carrying (CS) designations:

  • GEOG 588 Planning GIS for Emergency Management
  • GEOG 591 GIS for Health Analysis
  • GEOG 597 Visual Analytics: Leveraging Geo-Social Data* (MacEachren / Hardisty)
  • GEOG 597 Spatial Thinking (Klippel)
  • IST 530 Foundations in Social Informatics
  • IST 555 Intelligent Agents and Distributed Decision-Making

*These courses also satisfy the analytics distribution (A).

For a complete and current list, please consult the list of approved electives.

 

Examples of elective coursework plans

In practice, these requirements prescribe a maximum of 12 out-department credits to the program path of a student appropriate to the Social Data Analytics program. By virtue of completing home degree requirements, and appropriate selection of internal electives, most students will need only to complete the six credits of SoDA seminars and, typically, six credits of out-department coursework to satisfy the other distribution requirements.

Coursework examples, social science Ph.D. student:

A Ph.D. student in Political Science, Sociology, or Human Development and Family Studies will meet the (S) and (DC1) distribution requirements as a function of fulfilling home degree requirements. A student appropriate for SoDA would also typically be pursuing the methodology option within those degrees, which also satisfy the (Q) distribution requirements. Examples of two- and three-course selections that then satisfy the (A), (C), and (DC2) minimum distributions include

  • IST 557 Data Mining; STAT 540 Computational Statistics
  • STAT 557 Data Mining; GEOG 586 Geographic Information Analysis
  • IE 561 Data Mining Driven Design; IST 441 Information Retrieval
  • IST 557 Data Mining; GEOG 597 Geoinformatics
  • HDFS 597 Intro to Data Mining for HDFS; IST 597 Big Data Fundamentals; STAT 597 Statistical Privacy in Large Databases
  • GEOG 597 Geo-social Visual Analytics; GEOG 560 Spatio-Temporal Movement Analysis

Coursework examples, Statistics Ph.D. student:

A Ph.D. student in Statistics will meet the (Q) and (DC1) distribution requirements as a function of fulfilling home degree requirements. A student appropriate for SoDA would also typically be pursuing analytics and computationally-oriented electives within those degrees, which also satisfy the (A) and (C) distribution requirements (e.g., STAT 557 Data Mining, STAT 540 Computational Statistics). Examples of two- and three-course selections that then satisfy the (S) and (DC2) minimum distributions include

  • SOC 578 Multilevel Regression Models; HDFS 530 Longitudinal Structural Equation Modeling
  • SOC 579 Spatial Demography; GEOG 560 Spatio-temporal Movement Analysis
  • IST 558 Data Mining II; PLSC 597 Causal Inference; STAT 597 Statistical Privacy in Large Databases
  • GEOG 597 Geo-social Visual Analytics; SOC 597 Methods of Social Network Analysis

Coursework examples, Geography Ph.D. student:

A Ph.D. student in Statistics will meet the (DC2) distribution requirements as a function of fulfilling home degree requirements. A student appropriate for SoDA would also typically be pursuing GIScience coursework that satisfies the (C) and (A) distribution requirements, and is likely to also satisfy (S) and (Q) distributions within the Geography department. Examples of two- and three-course selections that then satisfy typical remaining (S), (Q), and (DC1) minimum distributions include

  • SOC 578 Multilevel Regression Models; STAT 557 Data Mining
  • PLSC 505 Time Series Analysis; PLSC 597 Political Events Data
  • SOC 597 Methods of Social Network Analysis; STAT 597 Statistical Privacy in Large Databases

Coursework examples, IST Ph.D. student:

A Ph.D. student in IST will meet the (DC2) and (C) distribution requirements as a function of fulfilling home degree requirements. A student appropriate for SoDA would also typically be pursuing coursework that satisfies the (C) distribution requirement, and is likely to also satisfy some (S) and (Q) distributions within IST. Examples of two- and three-course selections that then satisfy typical remaining (S), (Q), and (DC1) minimum distributions include

  • PLSC 505 Time Series Analysis; STAT 540 Computational Statistics
  • STAT 504 Discrete Data; SOC 572 Causal Analysis in the Social Sciences
  • IST 597 How the Mind Works; HDFS 597 Bayesian Methods for HDFS; CSE 584 Pattern Recognition
  • SOC 597 Methods of Social Network Analysis; STAT 597 Statistical Privacy in Large Databases