Learning Tree International, Inc.

Equities

LTRE

US5220151063

Professional & Business Education

Market Closed - OTC Markets Other stock markets 12:39:08 2024-04-11 pm EDT			5-day change	1st Jan Change
0.2 ^USD	0.00%		0.00%	0.00%

Mar. 20	Learning Tree International Launches Microsoft Applied Skills Instructor-Led Courses, Equipping Employees with In-Demand Skills and Credentials	CI
2023	Learning Tree International Launches New One-Day ITIL 4® Course to Help Professionals Recertify Before June 30 Deadline	CI

Text analytics: Words, Numbers, and Vectors

October 04, 2021

Humans understand words; machines work with numbers. In the world of text analytics, we must somehow convert our human words into numbers a computer can understand.

Of course, it is not enough to assign each word some sort of key value. One number cannot encompass the meaning and application of a word any more than your social insurance number conveys information about your eye color or your favorite sport. One approach is to define a multidimensional abstract space in which each individual characteristic of a word is represented by some "distance" in that direction. Each word is then represented by a vector in this abstract space. There are many possible ways that such an abstract vector space can be defined, and just as assuredly there is no abstract space that represents a word perfectly.

In practice, we most often look for a word embedding. Unfortunately, this phrase is not always used consistently. Some authors observe that words are "defined" for the computer by describing the myriad of ways in which they are "embedded" among other words in phrases, sentences, and documents. A more mathematical definition would be that words are converted into vectors that are "embedded" in a vector space of lower dimensionality than the space of all uses of all words. These word vector embeddings fulfill our requirement of being a numerical word representation that computers can work with. Embeddings, of course, are not unique; there are many possible embeddings to choose from, and each has its own strengths and weaknesses. Embeddings can be broadly classified into two groups, frequency-based embeddings which essentially provide word counts, and prediction-based embeddings. Many classic methods of text analysis are frequency-based embeddings. Prediction embeddings are sometimes called "neural word embeddings" since they use simple neural networks to organize the text data for further analysis.

One very popular, and often quite effective, prediction-based embedding is word2vec, which is actually not so much a single method as a cluster of closely related algorithms. Word2vec uses a neural network, simple by today's standards, to group words based on their "similarity". One example, which has become a sort of "hello, world" for text analytics, is the pair of words "king" and "queen". Clearly kings and queens have distinct features, but the words often appear in similar contexts within a document.

ON DEMAND WEBINAR:Machine Learning and Text Analytics

Watch Now!

In text analytics, as in any engineering enterprise, there are always tradeoffs. We can improve the effectiveness of a model by training it with a greater amount of text, increasing the number of dimensions in our abstract vector space, and increasing the number of words we consider are a "context" within a document. All of these choices come at the cost of increased computational compexity (and therefore increased time). Let's consider some examples.

Word2Vec

Uses a shallow two-layer neural network to learn about the contexts of words in a document or set of documents. Neural networks are, therefore, now being used to learn how to feed information to bigger, more powerful neural networks.

CBOW and Skip-Gram

CBOW (Continuous bag-of-words) and skip-grams attack word embedding from opposite directions. CBOW algorithms attempt to develop a neural network model that predicts the occurrence of words based on the surrounding context. Skip-grams, on the other hand attempt to predict a context based on individual words.

Remember, however, that this is not a parlour game in which the goal is to predict the missing word. It is an attempt to generate vectors that in some way encapsulate the use and meaning of words.

Concepts of Similarity

Many fields of artificial intelligence grapple with the necessity of providing a mathematical description for the somewhat vague notion of similarity. Your movie streaming service wants to quantify films that are similar to the ones you have already enjoyed. An autonomous vehicle needs to infer if objects in a video are similar to children in a school crosswalk.

In the world of text analytics a common measure of similarity is cosine similarity. Cosine similarity is closely related to the Pearson correlation coefficient of classic statistics. If two words are out there in the same direction in our abstract vector space, the cosine of the angle between those vectors is small and the cosine is close to one. If one word is straight ahead of us and the other is off our left shoulder, then the angle between the vectors is close to 90° and the cosine is close to zero.

The definition of cosine similarity provides a direct path to the prediction of word analogies. For the text analytics "hello world" example, we would expect that the cosine similarity between "male" and "king" to be close to the similarity between "female" and "queen".

Real Language is Always More Complicated

Ultimately, however, our concern with language is a concern with meaning, not words. In English, many words have multiple meanings. In Chinese, the meaningful interpretation of a word without its context is virtually impossible.

The idea of word embeddings can be extended to include sense embeddings, that is, embeddings that include the multple senses of individual words. Indeed, the popular skip-grams algorithm has been modified and extended into the multi-sense skip-gram or MSSG.

Word embeddings more sophisticated than those described here must be enlisted to meet the needs of machine translation. Examples include ELMo, XLNet, not to mention BERT and ERNIE from Google and Baidu, respectively.

Turning Theory Into Practice

In the next blog we will look at some actual examples that apply wrd2vec techniques using the popular library Gensim.

Attachments

Original document
Permalink

Disclaimer

Learning Tree International Inc. published this content on 05 October 2021 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 11 October 2021 21:11:05 UTC.

Latest news about Learning Tree International, Inc.

Learning Tree International Launches Microsoft Applied Skills Instructor-Led Courses, Equipping Employees with In-Demand Skills and Credentials	Mar. 20	CI
Learning Tree International Launches New One-Day ITIL 4® Course to Help Professionals Recertify Before June 30 Deadline	23-06-07	CI
Learning Tree Announces an Expanded Course Collaboration with Amazon Web Services, Inc	22-05-17	CI
Duke Corporate Education and Learning Tree Form New Partnership to Cultivate Key Leadership Skills for Technical Leaders	22-01-11	CI
Learning Tree Launches Official Cybersecurity Maturity Model Certification Course for Cybersecurity Mandate	21-09-30	CI
Learning Tree International Announces Appointment of Igor Lima as Chief Financial Officer	21-09-07	CI
Learning Tree & Strategy Implementation Institute Announces Exclusive Multimodal Training Partnership Breaking New Ground with a Strategy Implementation Certification	21-08-26	CI
Learning Tree and APMG International to Introduce Federal Technology Business Management Certification	21-06-24	CI
Learning Tree International Launches New Technology Business Management Training	21-03-24	CI
Learning Tree International Announces Appointment of David Brown as Chief Executive Officer and Member of the Board of Directors, Effective March 30, 2020	20-03-18	CI
Learning Tree Partners with Area9 and Howspace for Learning Innovation	20-02-04	CI
Learning Tree International, Inc.'s Equity Buyback announced on May 8, 2012, has expired.	19-09-27	CI
Tranche Update on Learning Tree International, Inc.'s Equity Buyback Plan announced on May 8, 2012.	18-12-21	CI
Learning Tree International, Inc. Announces Unaudited Consolidated Earnings Results for the Fourth Quarter and Year Ended September 28, 2018	18-12-20	CI
Learning Tree International, Inc. announced that it has received $3 million in funding	18-09-30	CI
Transcript : Learning Tree International, Inc., Q3 2018 Earnings Call, Aug 07, 2018	18-08-07
Learning Tree International, Inc. Appoints Kevin Gruneich as Chairman	18-08-07	CI
Tranche Update on Learning Tree International, Inc.'s Equity Buyback Plan announced on May 8, 2012.	18-08-07	CI
Learning Tree International, Inc. Announces Resignations of Directors	18-08-07	CI
Earning Tree International Inc. Announces Unaudited Consolidated Earnings for the Third Quarter and the Nine Months Ended June 30, 2018	18-08-07	CI
Learning Tree International, Inc. Announces Board Appointments	18-07-26	CI
Learning Tree International, Inc. Announces Board Changes	18-07-02	CI
Kevin Gruneich and and Donna Gruneich acquired 56.7% stake in Learning Tree International, Inc. from David C. Collins, Mary C. Collins, Collins Family Foundation, DCMA Holdings, LP and The Adventures in Learning Foundation for $7.5 million.	18-06-28	CI
Learning Tree International, Inc. announced that it expects to receive $5 million in funding	18-06-28	CI
Learning Tree International, Inc. Announces Unaudited Consolidated Earnings Results for the Second Quarter and Six Months Ended March 31, 2018; Provides Earnings Guidance for the Third Quarter of 2018	18-05-08	CI

Chart Learning Tree International, Inc.

Duration

Period

More charts

Company Profile

Learning Tree International, Inc. is a provider of learning solutions to support organizationsâ use of technology and effective business practices. The Company provides skills training and certification solutions to impact individual careers and organizational performance. The Companyâs courses include agile and scrum; business applications; cloud computing; cyber security; data, analytics and artificial intelligence; information technology (IT) infrastructure; IT service management; leadership; project management; and software. Its solutions include Coaching, Custom Content Solutions, Enterprise Solutions, Government Solutions, Learning Tree+ Advantage Plan, Learning Paths, Managed Training Services, Skills Assessments, Team Training, and Training Vouchers. Its certification courses include ITIL 4 Foundation Training, CompTIA Security+ Training, PMI Authorized PMP Exam Prep, Certified ScrumMaster Training, CISSP Training and Certification Prep Course, and others.

Sector

Professional & Business Education

More about the company

Sector Professional & Business Education

	1st Jan change	Capi.
LEARNING TREE INTERNATIONAL, INC.	0.00%	3.1M
IDP EDUCATION LIMITED	-18.27%	2.91B
OFFCN EDUCATION TECHNOLOGY CO., LTD.	-37.75%	2.16B
NATIONAL COMPANY FOR LEARNING AND EDUCATION	+33.67%	1.81B
AFYA LIMITED	-25.08%	1.48B
SHANGHAI ACTION EDUCATION TECHNOLOGY CO.,LTD.	+42.15%	893M
NIIT LEARNING SYSTEMS LIMITED	+16.08%	827M
ATAA EDUCATIONAL COMPANY	-0.70%	800M
PORTER HOLDING INTERNATIONAL, INC.	-.--%	813M
CHINA EAST EDUCATION HOLDINGS LIMITED	-16.04%	606M

Professional & Business Education