VW(1)

User Commands

VW(1)

NAME¶

vw - Vowpal Wabbit -- fast online learning tool

DESCRIPTION¶

VW options:¶

--ring_size arg: size of example ring
--onethread: Disable parse thread

Update options:¶

-l [ --learning_rate ] arg: Set learning rate
--power_t arg: t power value
--decay_learning_rate arg: Set Decay factor for learning_rate between passes
--initial_t arg: initial t value
--feature_mask arg: Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

Weight options:¶

-i [ --initial_regressor ] arg: Initial regressor(s)
--initial_weight arg: Set all weights to an initial value of arg.
--random_weights arg: make initial weights random
--normal_weights arg: make initial weights normal
--truncated_normal_weights arg: make initial weights truncated normal
--sparse_weights: Use a sparse datastructure for weights
--input_feature_regularizer arg: Per feature regularization input file

Parallelization options:¶

--span_server arg: Location of server for setting up spanning tree
--threads: Enable multi-threading
--unique_id arg (=0): unique id used for cluster parallel jobs
--total arg (=1): total number of nodes used in cluster parallel job
--node arg (=0): node number in cluster parallel job

Diagnostic options:¶

--version: Version information
-a [ --audit ]: print weights of features
-P [ --progress ] arg: Progress update frequency. int: additive, float: multiplicative
--quiet: Don't output disgnostics and progress updates
-h [ --help ]: Look here: http://hunch.net/~vw/ and click on Tutorial.

Random Seed option:¶

--random_seed arg: seed random number generator

Feature options:¶

--hash arg: how to hash the features. Available options: strings, all
--hash_seed arg (=0): seed for hash function
--ignore arg: ignore namespaces beginning with character <arg>
--ignore_linear arg: ignore namespaces beginning with character <arg> for linear terms only
--keep arg: keep namespaces beginning with character <arg>
--redefine arg: redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form 'N:=S' where := is operator. Empty N or S are treated as default namespace. Use ':' as a wildcard in S.
-b [ --bit_precision ] arg: number of bits in the feature table
--noconstant: Don't add a constant feature
-C [ --constant ] arg: Set initial value of constant
--ngram arg: Generate N grams. To generate N grams for a single namespace 'foo', arg should be fN.
--skips arg: Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace 'foo', arg should be fN.
--feature_limit arg: limit to N features. To apply to a single namespace 'foo', arg should be fN
--affix arg: generate prefixes/suffixes of features; argument '+2a,-3b,+1' means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace
--spelling arg: compute spelling features for a give namespace (use '_' for default namespace)
--dictionary arg: read a dictionary for additional features (arg either 'x:file' or just 'file')
--dictionary_path arg: look in this directory for dictionaries; defaults to current directory or env{PATH}
--interactions arg: Create feature interactions of any level between namespaces.
--permutations: Use permutations instead of combinations for feature interactions of same namespace.
--leave_duplicate_interactions: Don't remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: '-q ab -q ba' and a lot more in '-q ::'.
-q [ --quadratic ] arg: Create and use quadratic features
--q: arg: : corresponds to a wildcard for all printable characters
--cubic arg: Create and use cubic features

Example options:¶

-t [ --testonly ]: Ignore label information and just test
--holdout_off: no holdout data in multiple passes
--holdout_period arg (=10): holdout period for test only
--holdout_after arg: holdout after n training examples, default off (disables holdout_period)
--early_terminate arg (=3): Specify the number of passes tolerated when holdout loss doesn't decrease before early termination
--passes arg: Number of Training Passes
--initial_pass_length arg: initial number of examples per pass
--examples arg: number of examples to parse
--min_prediction arg: Smallest prediction to output
--max_prediction arg: Largest prediction to output
--sort_features: turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes
--loss_function arg (=squared): Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic, quantile and poisson.
--quantile_tau arg (=0.5): Parameter \tau associated with Quantile loss. Defaults to 0.5
--l1 arg: l_1 lambda
--l2 arg: l_2 lambda
--no_bias_regularization arg: no bias in regularization
--named_labels arg: use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg "--named_labels Noun,Verb,Adj,Punc"

Output model:¶

-f [ --final_regressor ] arg: Final regressor
--readable_model arg: Output human-readable final regressor with numeric features
--invert_hash arg: Output human-readable final regressor with feature names. Computationally expensive.
--save_resume: save extra state so learning can be resumed later with new data
--preserve_performance_counters: reset performance counters when warmstarting
--save_per_pass: Save the model after every pass over data
--output_feature_regularizer_binary arg: Per feature regularization output file
--output_feature_regularizer_text arg Per feature regularization output file,: in text
--id arg: User supplied ID embedded into the final regressor

Output options:¶

-p [ --predictions ] arg: File to output predictions to
-r [ --raw_predictions ] arg: File to output unnormalized predictions to

Audit Regressor:¶

--audit_regressor arg: stores feature names and their regressor values. Same dataset must be used for both regressor training and this mode.

Search options:¶

--search arg: Use learning to search, argument=maximum action id or 0 for LDF
--search_task arg: the search task (use "--search_task list" to get a list of available tasks)
--search_metatask arg: the search metatask (use "--search_metatask list" to get a list of available metatasks)
--search_interpolation arg: at what level should interpolation happen? [*data|policy]
--search_rollout arg: how should rollouts be executed? [policy|oracle|*mix_per_state|mix_p

: er_roll|none]

--search_rollin arg: how should past trajectories be generated? [policy|oracle|*mix_per_stat e|mix_per_roll]
--search_passes_per_policy arg (=1): number of passes per policy (only valid for search_interpolation=policy)
--search_beta arg (=0.5): interpolation rate for policies (only valid for search_interpolation=policy)
--search_alpha arg (=1.00000001e-10): annealed beta = 1-(1-alpha)^t (only valid for search_interpolation=data)
--search_total_nb_policies arg: if we are going to train the policies through multiple separate calls to vw, we need to specify this parameter and tell vw how many policies are eventually going to be trained
--search_trained_nb_policies arg: the number of trained policies in a file
--search_allowed_transitions arg: read file of allowed transitions [def: all transitions are allowed]
--search_subsample_time arg: instead of training at all timesteps, use a subset. if value in (0,1), train on a random v%. if v>=1, train on precisely v steps per example, if v<=-1, use active learning
--search_neighbor_features arg: copy features from neighboring lines. argument looks like: '-1:a,+2' meaning copy previous line namespace a and next next line from namespace _unnamed_, where ',' separates them
--search_rollout_num_steps arg: how many calls of "loss" before we stop really predicting on rollouts and switch to oracle (default means "infinite")
--search_history_length arg (=1): some tasks allow you to specify how much history their depend on; specify that here
--search_no_caching: turn off the built-in caching ability (makes things slower, but technically more safe)
--search_xv: train two separate policies, alternating prediction/learning
--search_perturb_oracle arg (=0): perturb the oracle on rollin with this probability
--search_linear_ordering: insist on generating examples in linear order (def: hoopla permutation)
--search_active_verify arg: verify that active learning is doing the right thing (arg = multiplier, should be = cost_range * range_c)
--search_save_every_k_runs arg: save model every k runs

Experience Replay:¶

--replay_c arg: use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size
--replay_c_count arg (=1): how many times (in expectation) should each example be played (default: 1 = permuting)

Explore evaluation:¶

--explore_eval: Evaluate explore_eval adf policies
--multiplier arg: Multiplier used to make all rejection sample probabilities <= 1

Make Multiclass into Contextual Bandit:¶

--cbify arg: Convert multiclass on <k> classes into a contextual bandit problem
--cbify_cs: consume cost-sensitive classification examples instead of multiclass
--loss0 arg (=0): loss for correct label
--loss1 arg (=1): loss for incorrect label

Contextual Bandit Exploration with Action Dependent Features:¶

--cb_explore_adf: Online explore-exploit for a contextual bandit problem with multiline action dependent features
--first arg: tau-first exploration
--epsilon arg: epsilon-greedy exploration
--bag arg: bagging-based exploration
--cover arg: Online cover based exploration
--psi arg (=1): disagreement parameter for cover
--nounif: do not explore uniformly on zero-probability actions in cover
--softmax: softmax exploration
--regcb: RegCB-elim exploration
--regcbopt: RegCB optimistic exploration
--mellowness arg (=0.100000001): RegCB mellowness parameter c_0. Default 0.1
--greedify: always update first policy once in bagging
--cb_min_cost arg (=0): lower bound on cost
--cb_max_cost arg (=1): upper bound on cost
--first_only: Only explore the first action in a tie-breaking event
--lambda arg (=-1): parameter for softmax

Contextual Bandit Exploration:¶

--cb_explore arg: Online explore-exploit for a <k> action contextual bandit problem
--first arg: tau-first exploration
--epsilon arg (=0.0500000007): epsilon-greedy exploration
--bag arg: bagging-based exploration
--cover arg: Online cover based exploration
--psi arg (=1): disagreement parameter for cover

Multiworld Testing Options:¶

--multiworld_test arg: Evaluate features as a policies
--learn arg: Do Contextual Bandit learning on <n> classes.
--exclude_eval: Discard mwt policy features before learning

Contextual Bandit with Action Dependent Features:¶

--cb_adf: Do Contextual Bandit learning with multiline action dependent features.
--rank_all: Return actions sorted by score order
--no_predict: Do not do a prediction when training
--cb_type arg (=ips): contextual bandit method to use in {ips,dm,dr, mtr}

Contextual Bandit Options:¶

--cb arg: Use contextual bandit learning with <k> costs
--cb_type arg (=dr): contextual bandit method to use in {ips,dm,dr}
--eval: Evaluate a policy rather than optimizing.

Cost Sensitive One Against All with Label Dependent Features:¶

--csoaa_ldf arg: Use one-against-all multiclass learning with label dependent features.
--ldf_override arg: Override singleline or multiline from csoaa_ldf or wap_ldf, eg if stored in file
--csoaa_rank: Return actions sorted by score order
--probabilities: predict probabilites of all classes
--wap_ldf arg: Use weighted all-pairs multiclass learning with label dependent features.

: Specify singleline or multiline.

Interact via elementwise multiplication:¶

--interact arg: Put weights on feature products from namespaces <n1> and <n2>

Cost Sensitive One Against All:¶

--csoaa arg: One-against-all multiclass with <k> costs

Cost-sensitive Active Learning:¶

--cs_active arg: Cost-sensitive active learning with <k> costs
--simulation: cost-sensitive active learning simulation mode
--baseline: cost-sensitive active learning baseline
--domination: cost-sensitive active learning use domination. Default 1
--mellowness arg (=0.100000001): mellowness parameter c_0. Default 0.1.
--range_c arg (=0.5): parameter controlling the threshold for per-label cost uncertainty. Default 0.5.
--max_labels arg (=18446744073709551615): maximum number of label queries.
--min_labels arg (=18446744073709551615): minimum number of label queries.
--cost_max arg (=1): cost upper bound. Default 1.
--cost_min arg (=0): cost lower bound. Default 0.
--csa_debug: print debug stuff for cs_active

Multilabel One Against All:¶

--multilabel_oaa arg: One-against-all multilabel with <k> labels

importance weight classes:¶

--classweight arg: importance weight multiplier for class

Recall Tree:¶

--recall_tree arg: Use online tree for multiclass
--max_candidates arg: maximum number of labels per leaf in the tree
--bern_hyper arg (=1): recall tree depth penalty
--max_depth arg: maximum depth of the tree, default log_2 (#classes)
--node_only arg (=0): only use node features, not full path features
--randomized_routing arg (=0): randomized routing

Logarithmic Time Multiclass Tree:¶

--log_multi arg: Use online tree for multiclass
--no_progress: disable progressive validation
--swap_resistance arg (=4): higher = more resistance to swap, default=4

Error Correcting Tournament Options:¶

--ect arg: Error correcting tournament with <k> labels
--error arg (=0): errors allowed by ECT

Boosting:¶

--boosting arg: Online boosting with <N> weak learners
--gamma arg (=0.100000001): weak learner's edge (=0.1), used only by online BBM
--alg arg (=BBM): specify the boosting algorithm: BBM (default), logistic (AdaBoost.OL.W), adaptive (AdaBoost.OL)

One Against All Options:¶

--oaa arg: One-against-all multiclass with <k> labels
--oaa_subsample arg: subsample this number of negative examples when learning
--probabilities: predict probabilites of all classes
--scores: output raw scores per class

Top K:¶

--top arg: top k recommendation

Experience Replay:¶

--replay_m arg: use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size
--replay_m_count arg (=1): how many times (in expectation) should each example be played (default: 1 = permuting)

Binary loss:¶

--binary: report loss as binary classification on -1,1

Bootstrap:¶

--bootstrap arg: k-way bootstrap by online importance resampling
--bs_type arg: prediction type {mean,vote}

scorer options:¶

--link arg (=identity): Specify the link function: identity, logistic, glf1 or poisson

Stagewise polynomial options:¶

--stage_poly: use stagewise polynomial feature learning
--sched_exponent arg (=1): exponent controlling quantity of included features
--batch_sz arg (=1000): multiplier on batch size before including more features
--batch_sz_no_doubling: batch_sz does not double

Low Rank Quadratics FA:¶

--lrqfa arg: use low rank quadratic features with field aware weights

Low Rank Quadratics:¶

--lrq arg: use low rank quadratic features
--lrqdropout: use dropout training for low rank quadratic features

Autolink:¶

--autolink arg: create link function with polynomial d

Marginal:¶

--marginal arg: substitute marginal label estimates for ids
--initial_denominator arg (=1): initial denominator
--initial_numerator arg (=0.5): initial numerator
--compete: enable competition with marginal features
--update_before_learn arg (=0): update marginal values before learning
--unweighted_marginals arg (=0): ignore importance weights when computing marginals
--decay arg (=0): decay multiplier per event (1e-3 for example)

Matrix Factorization Reduction:¶

--new_mf arg: rank for reduction-based matrix factorization

Neural Network:¶

--nn arg: Sigmoidal feedforward network with <k> hidden units
--inpass: Train or test sigmoidal feedforward network with input passthrough.
--multitask: Share hidden layer across all reduced tasks.
--dropout: Train or test sigmoidal feedforward network using dropout.
--meanfield: Train or test sigmoidal feedforward network using mean field.

Confidence:¶

--confidence: Get confidence for binary predictions
--confidence_after_training: Confidence after training

Active Learning with Cover:¶

--active_cover: enable active learning with cover
--mellowness arg (=8): active learning mellowness parameter c_0. Default 8.
--alpha arg (=1): active learning variance upper bound parameter alpha. Default 1.
--beta_scale arg (=3.1622777): active learning variance upper bound parameter beta_scale. Default sqrt(10).
--cover arg (=12): cover size. Default 12.
--oracular: Use Oracular-CAL style query or not. Default false.

Active Learning:¶

--active: enable active learning
--simulation: active learning simulation mode
--mellowness arg (=8): active learning mellowness parameter c_0. Default 8

Experience Replay:¶

--replay_b arg: use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size
--replay_b_count arg (=1): how many times (in expectation) should each example be played (default: 1 = permuting)

Baseline options:¶

--baseline: Learn an additive baseline (from constant features) and a residual separately in regression.
--lr_multiplier arg: learning rate multiplier for baseline model
--global_only: use separate example with only global constant for baseline predictions
--check_enabled: only use baseline when the example contains enabled flag

OjaNewton options:¶

--OjaNewton: Online Newton with Oja's Sketch
--sketch_size arg (=10): size of sketch
--epoch_size arg (=1): size of epoch
--alpha arg (=1): mutiplicative constant for indentiy
--alpha_inverse arg: one over alpha, similar to learning rate
--learning_rate_cnt arg (=2): constant for the learning rate 1/t
--normalize arg (=1): normalize the features or not
--random_init arg (=1): randomize initialization of Oja or not

LBFGS and Conjugate Gradient options:¶

--conjugate_gradient: use conjugate gradient based optimization
--bfgs: use bfgs optimization
--hessian_on: use second derivative in line search
--mem arg (=15): memory in bfgs
--termination arg (=0.00100000005): Termination threshold

Latent Dirichlet Allocation:¶

--lda arg: Run lda with <int> topics
--lda_alpha arg (=0.100000001): Prior on sparsity of per-document topic weights
--lda_rho arg (=0.100000001): Prior on sparsity of topic distributions
--lda_D arg (=10000): Number of documents
--lda_epsilon arg (=0.00100000005): Loop convergence threshold
--minibatch arg (=1): Minibatch size, for LDA
--math-mode arg (=0): Math mode: simd, accuracy, fast-approx
--metrics arg (=0): Compute metrics

Noop Learner:¶

--noop: do no learning

Print psuedolearner:¶

--print: print examples

Gradient Descent Matrix Factorization:¶

--rank arg: rank for matrix factorization.

Network sending:¶

--sendto arg: send examples to <host>

Stochastic Variance Reduced Gradient:¶

--svrg: Streaming Stochastic Variance Reduced Gradient
--stage_size arg (=1): Number of passes per SVRG stage

Follow the Regularized Leader:¶

--ftrl: FTRL: Follow the Proximal Regularized Leader
--ftrl_alpha arg (=0.00499999989): Learning rate for FTRL optimization
--ftrl_beta arg (=0.100000001): FTRL beta parameter
--pistol: FTRL: Parameter-free Stochastic Learning
--ftrl_alpha arg (=1): Learning rate for FTRL optimization
--ftrl_beta arg (=0.5): FTRL beta parameter

Kernel SVM:¶

--ksvm: kernel svm
--reprocess arg (=1): number of reprocess steps for LASVM
--pool_greedy: use greedy selection on mini pools
--para_active: do parallel active learning
--pool_size arg (=1): size of pools for active learning
--subsample arg (=1): number of items to subsample from the pool
--kernel arg (=linear): type of kernel (rbf or linear (default))
--bandwidth arg (=1): bandwidth of rbf kernel
--degree arg (=2): degree of poly kernel
--lambda arg: saving regularization for test time

Gradient Descent options:¶

--sgd: use regular stochastic gradient descent update.
--adaptive: use adaptive, individual learning rates.
--adax: use adaptive learning rates with x^2 instead of g^2x^2
--invariant: use safe/importance aware updates.
--normalized: use per feature normalized updates
--sparse_l2 arg (=0): use per feature normalized updates
--l1_state arg (=0): use per feature normalized updates
--l2_state arg (=1): use per feature normalized updates

Input options:¶

-d [ --data ] arg: Example Set
--daemon: persistent daemon mode on port 26542
--foreground: in persistent daemon mode, do not run in the background
--port arg: port to listen on; use 0 to pick unused port
--num_children arg: number of children for persistent daemon mode
--pid_file arg: Write pid file in persistent daemon mode
--port_file arg: Write port used in persistent daemon mode
-c [ --cache ]: Use a cache. The default is <data>.cache
--cache_file arg: The location(s) of cache_file.
--json: Enable JSON parsing.
--dsjson: Enable Decision Service JSON parsing.
-k [ --kill_cache ]: do not reuse existing cache: create a new one always
--compressed: use gzip format whenever possible. If a cache file is being created, this option creates a compressed cache file. A mixture of raw-text & compressed inputs are supported with autodetection.
--no_stdin: do not default to reading from stdin

December 2020

vw 8.6.1

Source file:	vw.1.en.gz (from vowpal-wabbit 8.6.1.dfsg1-1+b3)
Source last updated:	2020-12-12T18:34:43Z
Converted to HTML:	2022-09-07T22:18:12Z