Details

USC CS 599: Content Detection and Analysis for Big Data

USC CS 599: Content Detection and Analysis for Big Data
5/5 based on 1 votes.
This course is designed as an advanced course in data analytics, and big data. The course introduces students to the area of content detection and analysis. This involves understanding of digital file formats, their detection and data extraction from them. Emphasis areas include Document Type Detection; Parsing and extraction; Metadata understanding and analysis; Language Identification and detection from files and finally file formats and representation. The class also has a specific focus on Content Detection and Analysis from large data sets. Datasets used in the course are publicly collected by the instructor or his collaborators involved in national Big Data initiatives including DARPA, NASA and other projects. The course is designed to be accessible to students with experience programming in Java and in Python at an intermediate level. The first half of the course focuses on Java, using the Tika framework as the core technology for instruction. The instructor is the co-inventor of Tika and has deep experience in the technology and in search engines technology from Apache. The second half of the course introduces the students to the use of Python programming for Content Detection and Analysis using Tika, ElasticSearch™, Solr, Nutch and Apache Hadoop™. The course will be a combination of lecture, in-class discussion, readings, group-based assignments and a final exam.
Submitted by elementlist on Mar 25, 2017 (Edited Mar 25, 2017)
388 views. Averaging 0 views per day.

Post Reply


Please login or register if you wish to leave a comment.

Quick Search

Statistics

3,012 listings in 21 categories, with 2,252,093 clicks. Directory last updated Sep 12, 2023. Welcome Amara Fatima, the newest member.