Vocal Assistant

Natural Language Processing (NLP) refers to the branch of artificial intelligence that deals with giving computers the ability to understand texts and spoken words in a similar way to humans.

NLP combines computational linguistics (rule-based modeling of human language) with statistical, machine learning, and deep learning models. Together, these technologies enable computers to process human language in the form of text or speech data and "understand" the full meaning, intent, and feeling of the speaker or writer.

NLP underlies computer programs that translate text from one language to another, respond to voice commands, and summarize large volumes of text quickly, even in real time.

Human language is full of ambiguities that make it incredibly difficult to write software that accurately determines the meaning of text or voice data. Homonyms, homophones, sarcasm, idiomatic expressions, metaphors, grammatical and usage exceptions, variations in sentence structure-these are just some of the irregularities in human language that humans take years to learn, but that programmers must teach natural language-based applications to recognize and understand accurately from the outset if such applications are to be useful.

Several NLP tasks analyze human text and speech data in ways that help the computer make sense of what it is ingesting. Some of these tasks include the following:

  • Speech recognition, also called speech-to-text, is the task of reliably converting speech data into text data. Speech recognition is necessary for any application that follows spoken commands or answers spoken questions. What makes speech recognition particularly challenging is the way people speak: fast, slurring words, with varying emphasis and intonation, with different accents, and often using incorrect grammar.
  • Sentiment analysis seeks to extract subjective qualities from text - attitudes, emotions, sarcasm, confusion, suspicion.
Natural language generation is sometimes described as the opposite of speech recognition or speech-to-text; it is the task of inserting structured information into human language.

This code provides a simple implementation of the NLP concept, taking advantage of the google API for spoken sentence recognition. One of the main libraries used is Natural Language Toolkit (NLTK).

NLTK includes libraries for many of the NLP tasks listed above, as well as libraries for secondary tasks, such as sentence parsing, word segmentation, stemming and lemmatization (methods for reducing words to their roots), and tokenization (for breaking down sentences, phrases, paragraphs, and passages into tokens that help the computer better understand the text). It also includes libraries for implementing features such as semantic reasoning, the ability to reach logical conclusions based on facts extracted from the text.

Once the sentence is recognized, the script goes on to identify the topic being discussed through intent mapping: The sentence is reduced to the topic through the tokenization process, going on to identify the field and, therefore, the query.

It is possible to ask simple commands, such as asking for the time or the time in a particular city, asking what day it will be or what days it was, going to youtube to search for the song you want or simply googling anything you want. This is possible through the use of a specially created form that cleans the message of common patterns, leaving only the necessary information. For example, the phrase 'What time is it in Tokyo?" will be identified as a time request and will be cleaned of the most common words that make up the 'time' target, so only something like 'Tokyo' will remain. At this point, the city is identified and the time stamp is taken to have the correct time answered.

In addition, the script allows the voice to be isolated through a CNN trained to identify the user's voice or background noise.
Vocal Assistant

Project Information

  • Category: ML
  • Proeject Url:
  • About: Voice assistant created to allow you to perform custom functions and execute specific commands.
Natural Language Processing (NLP) refers to the branch of artificial intelligence that deals with giving computers the ability to understand texts and spoken words in a similar way to humans.

NLP combines computational linguistics (rule-based modeling of human language) with statistical, machine learning, and deep learning models. Together, these technologies enable computers to process human language in the form of text or speech data and "understand" the full meaning, intent, and feeling of the speaker or writer.

NLP underlies computer programs that translate text from one language to another, respond to voice commands, and summarize large volumes of text quickly, even in real time.

Human language is full of ambiguities that make it incredibly difficult to write software that accurately determines the meaning of text or voice data. Homonyms, homophones, sarcasm, idiomatic expressions, metaphors, grammatical and usage exceptions, variations in sentence structure-these are just some of the irregularities in human language that humans take years to learn, but that programmers must teach natural language-based applications to recognize and understand accurately from the outset if such applications are to be useful.

Several NLP tasks analyze human text and speech data in ways that help the computer make sense of what it is ingesting. Some of these tasks include the following:

  • Speech recognition, also called speech-to-text, is the task of reliably converting speech data into text data. Speech recognition is necessary for any application that follows spoken commands or answers spoken questions. What makes speech recognition particularly challenging is the way people speak: fast, slurring words, with varying emphasis and intonation, with different accents, and often using incorrect grammar.
  • Sentiment analysis seeks to extract subjective qualities from text - attitudes, emotions, sarcasm, confusion, suspicion.
Natural language generation is sometimes described as the opposite of speech recognition or speech-to-text; it is the task of inserting structured information into human language.

This code provides a simple implementation of the NLP concept, taking advantage of the google API for spoken sentence recognition. One of the main libraries used is Natural Language Toolkit (NLTK).

NLTK includes libraries for many of the NLP tasks listed above, as well as libraries for secondary tasks, such as sentence parsing, word segmentation, stemming and lemmatization (methods for reducing words to their roots), and tokenization (for breaking down sentences, phrases, paragraphs, and passages into tokens that help the computer better understand the text). It also includes libraries for implementing features such as semantic reasoning, the ability to reach logical conclusions based on facts extracted from the text.

Once the sentence is recognized, the script goes on to identify the topic being discussed through intent mapping: The sentence is reduced to the topic through the tokenization process, going on to identify the field and, therefore, the query.

It is possible to ask simple commands, such as asking for the time or the time in a particular city, asking what day it will be or what days it was, going to youtube to search for the song you want or simply googling anything you want. This is possible through the use of a specially created form that cleans the message of common patterns, leaving only the necessary information. For example, the phrase 'What time is it in Tokyo?" will be identified as a time request and will be cleaned of the most common words that make up the 'time' target, so only something like 'Tokyo' will remain. At this point, the city is identified and the time stamp is taken to have the correct time answered.

In addition, the script allows the voice to be isolated through a CNN trained to identify the user's voice or background noise.