Scroll to navigation

VW(1) User Commands VW(1)

NAME

vw - Vowpal Wabbit -- fast online learning tool

DESCRIPTION

VW options:

size of example ring
Disable parse thread

Update options:

Set learning rate
t power value
Set Decay factor for learning_rate between passes
initial t value
Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options:

Initial regressor(s)
Set all weights to an initial value of arg.
make initial weights random
make initial weights normal
make initial weights truncated normal
Use a sparse datastructure for weights
Per feature regularization input file

Parallelization options:

Location of server for setting up spanning tree
Enable multi-threading
unique id used for cluster parallel jobs
total number of nodes used in cluster parallel job
node number in cluster parallel job

Diagnostic options:

Version information
print weights of features
Progress update frequency. int: additive, float: multiplicative
Don't output disgnostics and progress updates
Look here: http://hunch.net/~vw/ and click on Tutorial.

Random Seed option:

seed random number generator

Feature options:

how to hash the features. Available options: strings, all
seed for hash function
ignore namespaces beginning with character <arg>
ignore namespaces beginning with character <arg> for linear terms only
keep namespaces beginning with character <arg>
redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form 'N:=S' where := is operator. Empty N or S are treated as default namespace. Use ':' as a wildcard in S.
number of bits in the feature table
Don't add a constant feature
Set initial value of constant
Generate N grams. To generate N grams for a single namespace 'foo', arg should be fN.
Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace 'foo', arg should be fN.
limit to N features. To apply to a single namespace 'foo', arg should be fN
generate prefixes/suffixes of features; argument '+2a,-3b,+1' means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace
compute spelling features for a give namespace (use '_' for default namespace)
read a dictionary for additional features (arg either 'x:file' or just 'file')
look in this directory for dictionaries; defaults to current directory or env{PATH}
Create feature interactions of any level between namespaces.
Use permutations instead of combinations for feature interactions of same namespace.
Don't remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: '-q ab -q ba' and a lot more in '-q ::'.
Create and use quadratic features
: corresponds to a wildcard for all printable characters
Create and use cubic features

Example options:

Ignore label information and just test
no holdout data in multiple passes
holdout period for test only
holdout after n training examples, default off (disables holdout_period)
Specify the number of passes tolerated when holdout loss doesn't decrease before early termination
Number of Training Passes
initial number of examples per pass
number of examples to parse
Smallest prediction to output
Largest prediction to output
turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes
Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic, quantile and poisson.
Parameter \tau associated with Quantile loss. Defaults to 0.5
l_1 lambda
l_2 lambda
no bias in regularization
use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg "--named_labels Noun,Verb,Adj,Punc"

Output model:

Final regressor
Output human-readable final regressor with numeric features
Output human-readable final regressor with feature names. Computationally expensive.
save extra state so learning can be resumed later with new data
reset performance counters when warmstarting
Save the model after every pass over data
Per feature regularization output file
in text
User supplied ID embedded into the final regressor

Output options:

File to output predictions to
File to output unnormalized predictions to

Audit Regressor:

stores feature names and their regressor values. Same dataset must be used for both regressor training and this mode.

Search options:

Use learning to search, argument=maximum action id or 0 for LDF
the search task (use "--search_task list" to get a list of available tasks)
the search metatask (use "--search_metatask list" to get a list of available metatasks)
at what level should interpolation happen? [*data|policy]
how should rollouts be executed? [policy|oracle|*mix_per_state|mix_p
er_roll|none]
how should past trajectories be generated? [policy|oracle|*mix_per_stat e|mix_per_roll]
number of passes per policy (only valid for search_interpolation=policy)
interpolation rate for policies (only valid for search_interpolation=policy)
annealed beta = 1-(1-alpha)^t (only valid for search_interpolation=data)
if we are going to train the policies through multiple separate calls to vw, we need to specify this parameter and tell vw how many policies are eventually going to be trained
the number of trained policies in a file
read file of allowed transitions [def: all transitions are allowed]
instead of training at all timesteps, use a subset. if value in (0,1), train on a random v%. if v>=1, train on precisely v steps per example, if v<=-1, use active learning
copy features from neighboring lines. argument looks like: '-1:a,+2' meaning copy previous line namespace a and next next line from namespace _unnamed_, where ',' separates them
how many calls of "loss" before we stop really predicting on rollouts and switch to oracle (default means "infinite")
some tasks allow you to specify how much history their depend on; specify that here
turn off the built-in caching ability (makes things slower, but technically more safe)
train two separate policies, alternating prediction/learning
perturb the oracle on rollin with this probability
insist on generating examples in linear order (def: hoopla permutation)
verify that active learning is doing the right thing (arg = multiplier, should be = cost_range * range_c)
save model every k runs

Experience Replay:

use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size
how many times (in expectation) should each example be played (default: 1 = permuting)

Explore evaluation:

Evaluate explore_eval adf policies
Multiplier used to make all rejection sample probabilities <= 1

Make Multiclass into Contextual Bandit:

Convert multiclass on <k> classes into a contextual bandit problem
consume cost-sensitive classification examples instead of multiclass
loss for correct label
loss for incorrect label

Contextual Bandit Exploration with Action Dependent Features:

Online explore-exploit for a contextual bandit problem with multiline action dependent features
tau-first exploration
epsilon-greedy exploration
bagging-based exploration
Online cover based exploration
disagreement parameter for cover
do not explore uniformly on zero-probability actions in cover
softmax exploration
RegCB-elim exploration
RegCB optimistic exploration
RegCB mellowness parameter c_0. Default 0.1
always update first policy once in bagging
lower bound on cost
upper bound on cost
Only explore the first action in a tie-breaking event
parameter for softmax

Contextual Bandit Exploration:

Online explore-exploit for a <k> action contextual bandit problem
tau-first exploration
epsilon-greedy exploration
bagging-based exploration
Online cover based exploration
disagreement parameter for cover

Multiworld Testing Options:

Evaluate features as a policies
Do Contextual Bandit learning on <n> classes.
Discard mwt policy features before learning

Contextual Bandit with Action Dependent Features:

Do Contextual Bandit learning with multiline action dependent features.
Return actions sorted by score order
Do not do a prediction when training
contextual bandit method to use in {ips,dm,dr, mtr}

Contextual Bandit Options:

Use contextual bandit learning with <k> costs
contextual bandit method to use in {ips,dm,dr}
Evaluate a policy rather than optimizing.

Cost Sensitive One Against All with Label Dependent Features:

Use one-against-all multiclass learning with label dependent features.
Override singleline or multiline from csoaa_ldf or wap_ldf, eg if stored in file
Return actions sorted by score order
predict probabilites of all classes
Use weighted all-pairs multiclass learning with label dependent features.
Specify singleline or multiline.

Interact via elementwise multiplication:

Put weights on feature products from namespaces <n1> and <n2>

Cost Sensitive One Against All:

One-against-all multiclass with <k> costs

Cost-sensitive Active Learning:

Cost-sensitive active learning with <k> costs
cost-sensitive active learning simulation mode
cost-sensitive active learning baseline
cost-sensitive active learning use domination. Default 1
mellowness parameter c_0. Default 0.1.
parameter controlling the threshold for per-label cost uncertainty. Default 0.5.
maximum number of label queries.
minimum number of label queries.
cost upper bound. Default 1.
cost lower bound. Default 0.
print debug stuff for cs_active

Multilabel One Against All:

One-against-all multilabel with <k> labels

importance weight classes:

importance weight multiplier for class

Recall Tree:

Use online tree for multiclass
maximum number of labels per leaf in the tree
recall tree depth penalty
maximum depth of the tree, default log_2 (#classes)
only use node features, not full path features
randomized routing

Logarithmic Time Multiclass Tree:

Use online tree for multiclass
disable progressive validation
higher = more resistance to swap, default=4

Error Correcting Tournament Options:

Error correcting tournament with <k> labels
errors allowed by ECT

Boosting:

Online boosting with <N> weak learners
weak learner's edge (=0.1), used only by online BBM
specify the boosting algorithm: BBM (default), logistic (AdaBoost.OL.W), adaptive (AdaBoost.OL)

One Against All Options:

One-against-all multiclass with <k> labels
subsample this number of negative examples when learning
predict probabilites of all classes
output raw scores per class

Top K:

top k recommendation

Experience Replay:

use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size
how many times (in expectation) should each example be played (default: 1 = permuting)

Binary loss:

report loss as binary classification on -1,1

Bootstrap:

k-way bootstrap by online importance resampling
prediction type {mean,vote}

scorer options:

Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:

use stagewise polynomial feature learning
exponent controlling quantity of included features
multiplier on batch size before including more features
batch_sz does not double

Low Rank Quadratics FA:

use low rank quadratic features with field aware weights

Low Rank Quadratics:

use low rank quadratic features
use dropout training for low rank quadratic features

Autolink:

create link function with polynomial d

Marginal:

substitute marginal label estimates for ids
initial denominator
initial numerator
enable competition with marginal features
update marginal values before learning
ignore importance weights when computing marginals
decay multiplier per event (1e-3 for example)

Matrix Factorization Reduction:

rank for reduction-based matrix factorization

Neural Network:

Sigmoidal feedforward network with <k> hidden units
Train or test sigmoidal feedforward network with input passthrough.
Share hidden layer across all reduced tasks.
Train or test sigmoidal feedforward network using dropout.
Train or test sigmoidal feedforward network using mean field.

Confidence:

Get confidence for binary predictions
Confidence after training

Active Learning with Cover:

enable active learning with cover
active learning mellowness parameter c_0. Default 8.
active learning variance upper bound parameter alpha. Default 1.
active learning variance upper bound parameter beta_scale. Default sqrt(10).
cover size. Default 12.
Use Oracular-CAL style query or not. Default false.

Active Learning:

enable active learning
active learning simulation mode
active learning mellowness parameter c_0. Default 8

Experience Replay:

use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size
how many times (in expectation) should each example be played (default: 1 = permuting)

Baseline options:

Learn an additive baseline (from constant features) and a residual separately in regression.
learning rate multiplier for baseline model
use separate example with only global constant for baseline predictions
only use baseline when the example contains enabled flag

OjaNewton options:

Online Newton with Oja's Sketch
size of sketch
size of epoch
mutiplicative constant for indentiy
one over alpha, similar to learning rate
constant for the learning rate 1/t
normalize the features or not
randomize initialization of Oja or not

LBFGS and Conjugate Gradient options:

use conjugate gradient based optimization
use bfgs optimization
use second derivative in line search
memory in bfgs
Termination threshold

Latent Dirichlet Allocation:

Run lda with <int> topics
Prior on sparsity of per-document topic weights
Prior on sparsity of topic distributions
Number of documents
Loop convergence threshold
Minibatch size, for LDA
Math mode: simd, accuracy, fast-approx
Compute metrics

Noop Learner:

do no learning
print examples

Gradient Descent Matrix Factorization:

rank for matrix factorization.

Network sending:

send examples to <host>

Stochastic Variance Reduced Gradient:

Streaming Stochastic Variance Reduced Gradient
Number of passes per SVRG stage

Follow the Regularized Leader:

FTRL: Follow the Proximal Regularized Leader
Learning rate for FTRL optimization
FTRL beta parameter
FTRL: Parameter-free Stochastic Learning
Learning rate for FTRL optimization
FTRL beta parameter

Kernel SVM:

kernel svm
number of reprocess steps for LASVM
use greedy selection on mini pools
do parallel active learning
size of pools for active learning
number of items to subsample from the pool
type of kernel (rbf or linear (default))
bandwidth of rbf kernel
degree of poly kernel
saving regularization for test time

Gradient Descent options:

use regular stochastic gradient descent update.
use adaptive, individual learning rates.
use adaptive learning rates with x^2 instead of g^2x^2
use safe/importance aware updates.
use per feature normalized updates
use per feature normalized updates
use per feature normalized updates
use per feature normalized updates

Input options:

Example Set
persistent daemon mode on port 26542
in persistent daemon mode, do not run in the background
port to listen on; use 0 to pick unused port
number of children for persistent daemon mode
Write pid file in persistent daemon mode
Write port used in persistent daemon mode
Use a cache. The default is <data>.cache
The location(s) of cache_file.
Enable JSON parsing.
Enable Decision Service JSON parsing.
do not reuse existing cache: create a new one always
use gzip format whenever possible. If a cache file is being created, this option creates a compressed cache file. A mixture of raw-text & compressed inputs are supported with autodetection.
do not default to reading from stdin
December 2020 vw 8.6.1