Details
USC CS 599: Content Detection and Analysis for Big Data
|
This course is designed as an advanced course in data analytics, and big data. The course introduces students to the area of content detection and analysis. This involves understanding of digital file formats, their detection and data extraction from them. Emphasis areas include Document Type Detection; Parsing and extraction; Metadata understanding and analysis; Language Identification and detection from files and finally file formats and representation. The class also has a specific focus on Content Detection and Analysis from large data sets. Datasets used in the course are publicly collected by the instructor or his collaborators involved in national Big Data initiatives including DARPA, NASA and other projects. The course is designed to be accessible to students with experience programming in Java and in Python at an intermediate level. The first half of the course focuses on Java, using the Tika framework as the core technology for instruction. The instructor is the co-inventor of Tika and has deep experience in the technology and in search engines technology from Apache. The second half of the course introduces the students to the use of Python programming for Content Detection and Analysis using Tika, ElasticSearch™, Solr, Nutch and Apache Hadoop™. The course will be a combination of lecture, in-class discussion, readings, group-based assignments and a final exam. |
Submitted by elementlist on Mar 25, 2017 (Edited Mar 25, 2017) |
388 views. Averaging 0 views per day. |
Please login or register if you wish to leave a comment.
Submit
New Links
Most Popular
Quick Search
Statistics
3,012 listings in 21 categories, with 2,252,093 clicks. Directory last updated Sep 12, 2023.
Welcome Amara Fatima, the newest member.
Comments on USC CS 599: Content Detection and Analysis for Big Data