Element List - Resources for Scientific Research: Nlp

31 matching results for "nlp":

DBpedia

100/5
1
2
3
4
5

Submitted Apr 26, 2017 to Scientific Data

DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data. We hope that this work will make it easier for the huge amount of information in Wikipedia to be used in some new interesting ways. Furthermore, it might inspire new mechanisms for navigating, linking, and improving the encyclopedia itself.

Tags: nlp, machine learning

Details Rate Report

WordNet: A Lexical Database of English

100/5
1
2
3
4
5

Submitted Apr 26, 2017 to Scientific Data

WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet's structure makes it a useful tool for computational linguistics and natural language processing.

WordNet superficially resembles a thesaurus, in that it groups words together based on their meanings. However, there are some important distinctions. First, WordNet interlinks not just word forms—strings of letters—but specific senses of words. As a result, words that are found in close proximity to one another in the network are semantically disambiguated. Second, WordNet labels the semantic relations among words, whereas the groupings of words in a thesaurus does not follow any explicit pattern other than meaning similarity.

Tags: nlp

Details Rate Report

Stanford Natural Language Inference (SNLI) Corpus

100/5
1
2
3
4
5

Submitted Apr 20, 2017 to Scientific Data

The SNLI corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference (NLI), also known as recognizing textual entailment (RTE). We aim for it to serve both as a benchmark for evaluating representational systems for text, especially including those induced by representation learning methods, as well as a resource for developing NLP models of any kind.

Tags: nlp

Details Rate Report

Quora Question Pairs Dataset

100/5
1
2
3
4
5

Submitted Apr 20, 2017 (Edited Apr 20, 2017) to Scientific Data

Today, we are excited to announce the first in what we plan to be a series of public dataset releases. Our dataset releases will be oriented around various problems of relevance to Quora and will give researchers in diverse areas such as machine learning, natural language processing, network science, etc. the opportunity to try their hand at some of the challenges that arise in building a scalable online knowledge-sharing platform. Our first dataset is related to the problem of identifying duplicate questions.

Tags: nlp, machine learning

Details Rate Report

Google Books Ngram Viewer

100/5
1
2
3
4
5

Submitted Apr 20, 2017 to Scientific Software

Google Books Ngram Viewer allows one to easily graph comma-separated phrases and view their occurrence frequency.

Tags: nlp

Details Rate Report

Abstract Meaning Representation Bank

100/5
1
2
3
4
5

Submitted Apr 17, 2017 to Scientific Data

The AMR Bank is a set of English sentences paired with simple, readable semantic representations. We hope that it will spur new research in natural language understanding, generation, and translation.

The AMR Bank is manually constructed by human annotators at:
- The Linguistic Data Consortium
- SDL
- The University of Colorado's Center for Computational Language and Education Research (CLEAR)
- The University of Southern California's Information Sciences Institute (ISI) and Computational Linguistics at USC.

Tags: nlp

Details Rate Report

Computational Linguistics and Deep Learning

100/5
1
2
3
4
5

Submitted Apr 16, 2017 to Science Blogs

A look at the importance of Natural Language Processing by Christopher D. Manning.

Tags: deep learning, nlp

Details Rate Report

Advanced Text Analysis with SpaCy and Scikit-Learn

100/5
1
2
3
4
5

Submitted Apr 15, 2017 (Edited Apr 16, 2017) to Science Courses and Tutorials

This notebook was originally prepared for the workshop Advanced Text Analysis with SpaCy and Scikit-Learn, presented as part of NYCDH Week 2017. Here, we try out features of the SpaCy library for natural language processing. We also do some statistical analysis using the scikit-learn library.

Tags: nlp, python

Details Rate Report

ANC Manually Annotated Sub-Corpus (MASC)

100/5
1
2
3
4
5

Submitted Apr 07, 2017 (Edited Apr 07, 2017) to Scientific Data

The Manually Annotated Sub-Corpus (MASC) consists of approximately 500,000 words of contemporary American English written and spoken data drawn from the Open American National Corpus (OANC).

All of MASC includes manually validated annotations for sentence boundaries, token, lemma and POS; noun and verb chunks; named entities (person, location, organization, date); Penn Treebank syntax; coreference; and discourse structure. Additional manually produced or validated annotations have been produced by the MASC project for portions of the sub-corpus, including full-text annotation for FrameNet frame elements and a 100K+ sentence corpus with WordNet 3.1 sense tags, of which one-tenth are also annotated for FrameNet frame elements. Annotations of all or portions of the sub-corpus for a wide variety of other linguistic phenomena have been contributed by other projects, including PropBank, TimeBank, Pittsburgh opinion, and several others.

Unlike most freely available corpora including a wide variety of linguistic annotations, MASC contains a balanced selection of texts from a broad range of genres.

MASC is an OPEN LANGUAGE DATA resource that can be downloaded by anyone for any purpose. At the same time, it is a COLLABORATIVE COMMUNITY RESOURCE that will ultimately be sustained by community contributions of annotations and derived data.

Tags: nlp

Details Rate Report

Computational Linguistics & Psycholinguistics Research Center

100/5
1
2
3
4
5

Submitted Mar 27, 2017 to Science Research Groups » Computer Science

CLiPS (Computational Linguistics & Psycholinguistics) is a research center associated with the Linguistics department of the faculty of Arts of the University of Antwerp, and is the result of the fusion of the CNTS and CPL research centers.

Most of the CLiPS research is based on competitively acquired research funding. Funding agencies include the Research Foundation - Flanders, the Institute for the Promotion of Innovation by Science and Technology in Flanders, the Dutch Language Union, the European Commission and occasionally companies.

The goal of CLiPS is to produce internationally recognized top research and resources in (developmental) psycholinguistics, (corpus) linguistics, and computational linguistics, and to investigate the interdisciplinary combinations of these disciplines.

Tags: nlp

Details Rate Report

TextRank: Text summarization and keyword extraction

100/5
1
2
3
4
5

Submitted Mar 27, 2017 to Scientific Software

TextRank is an implementation for text summarization and keyword extraction in Python. TextRank also offers text modeling with graph and gexf exportation.

Tags: nlp

Details Rate Report

Text Information Management and Analysis Group at UIUC

100/5
1
2
3
4
5

Submitted Mar 27, 2017 to Science Research Groups » Computer Science

The Text Information Management and Analysis (TIMAN) group is part of the Database and Information Systems (DAIS) Lab of the Computer Science Department at University of Illinois at Urbana-Champaign. We work on a wide spectrum of problems in the general area of text information management and analysis , including retrieval, organization, filtering , summarization, and mining of textual information, aiming at developing advanced text information management and analysis techniques and systems that help people make better use of text information.

Tags: nlp

Details Rate Report

Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions

100/5
1
2
3
4
5

Submitted Mar 27, 2017 to Science Research Articles

We present a novel graph-based summarization framework (Opinosis) that generates concise abstractive summaries of highly redundant opinions. Evaluation results on summarizing user reviews show that Opinosis summaries have better agreement with human summaries compared to the baseline extractive method. The summaries are readable, reasonably well-formed and are informative enough to convey the major opinions.

Tags: nlp

Details Rate Report

MeTA: ModErn Text Analysis

100/5
1
2
3
4
5

Submitted Mar 27, 2017 to Scientific Software

MeTA is a modern C++ data sciences toolkit featuring
- text tokenization, including deep semantic features like parse trees
- inverted and forward indexes with compression and various caching strategies
- a collection of ranking functions for searching the indexes
- topic models
- classification algorithms
- graph algorithms
- language models
- CRF implementation (POS-tagging, shallow parsing)
- wrappers for liblinear and libsvm (including libsvm dataset parsers)
- UTF8 support for analysis on various languages
- multithreaded algorithms

Tags: nlp

Details Rate Report

Mining Paradigmatic Word Associations

100/5
1
2
3
4
5

Submitted Mar 27, 2017 to Science Courses and Tutorials

Mining word associations from a body of text is often one of the first Natural Language Processing techniques used when mining text data. Word associations are useful for performing NLP tasks such as part of speech tagging, parsing, entity extraction, etc. We will take a brief look at one type of word association called paradigmatic association and show how we can use the Neo4j graph database to help model our text corpus as a graph and implement a simple paradigmatic relation mining algorithm.

Tags: nlp

Details Rate Report

Using a Graph Database for Deep Learning Text Classification

100/5
1
2
3
4
5

Submitted Mar 27, 2017 to Science Blogs

Graphify is a Neo4j unmanaged extension that provides plug and play natural language text classification. Graphify gives you a mechanism to train natural language parsing models that extract features of a text using deep learning. When training a model to recognize the meaning of a text, you can send an article of text with a provided set of labels that describe the nature of the text. Over time the natural language parsing model in Neo4j will grow to identify those features that optimally disambiguate a text to a set of classes. This blog post explains how it works.

Tags: nlp

Details Rate Report

Graphify

100/5
1
2
3
4
5

Submitted Mar 27, 2017 to Scientific Software

Graphify is a Neo4j unmanaged extension used for document and text classification using graph-based hierarchical pattern recognition.

Tags: nlp

Details Rate Report

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

100/5
1
2
3
4
5

Submitted Mar 08, 2017 to Science Blogs

The joint many-task model tackles multiple NLP tasks with a single architecture. Tasks are layered such that subsequent and previous tasks benefit from training of the closely-related tasks. Though applied to specific NLP objectives, the proposed model introduces a powerful concept for future research.

Tags: ai, machine learning, nlp

Details Rate Report

Word2Vec Tutorial - The Skip-Gram Model

100/5
1
2
3
4
5

Submitted Mar 08, 2017 to Science Courses and Tutorials

This tutorial covers the skip gram neural network architecture for Word2Vec. My intention with this tutorial was to skip over the usual introductory and abstract insights about Word2Vec, and get into more of the details. Specifically here I’m diving into the skip gram neural network model.

Tags: nlp, machine learning

Details Rate Report

PHP wrapper for the Stanford NLP library

100/5
1
2
3
4
5

Submitted Feb 07, 2017 to Scientific Software

A PHP wrapper for the Stanford Natural Language Processing library. Supports POSTagger and CRFClassifier. Loads automatically the right packages and detects the language of the given text.

Tags: nlp

Details Rate Report