table of contents
other versions
- unstable 8.6.1.dfsg1-1+b3
VW(1) | User Commands | VW(1) |
NAME¶
vw - Vowpal Wabbit -- fast online learning tool
DESCRIPTION¶
VW options:¶
- --ring_size arg
- size of example ring
- --onethread
- Disable parse thread
Update options:¶
- -l [ --learning_rate ] arg
- Set learning rate
- --power_t arg
- t power value
- --decay_learning_rate arg
- Set Decay factor for learning_rate between passes
- --initial_t arg
- initial t value
- --feature_mask arg
- Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.
Weight options:¶
- -i [ --initial_regressor ] arg
- Initial regressor(s)
- --initial_weight arg
- Set all weights to an initial value of arg.
- --random_weights arg
- make initial weights random
- --normal_weights arg
- make initial weights normal
- --truncated_normal_weights arg
- make initial weights truncated normal
- --sparse_weights
- Use a sparse datastructure for weights
- --input_feature_regularizer arg
- Per feature regularization input file
Parallelization options:¶
- --span_server arg
- Location of server for setting up spanning tree
- --threads
- Enable multi-threading
- --unique_id arg (=0)
- unique id used for cluster parallel jobs
- --total arg (=1)
- total number of nodes used in cluster parallel job
- --node arg (=0)
- node number in cluster parallel job
Diagnostic options:¶
- --version
- Version information
- -a [ --audit ]
- print weights of features
- -P [ --progress ] arg
- Progress update frequency. int: additive, float: multiplicative
- --quiet
- Don't output disgnostics and progress updates
- -h [ --help ]
- Look here: http://hunch.net/~vw/ and click on Tutorial.
Random Seed option:¶
- --random_seed arg
- seed random number generator
Feature options:¶
- --hash arg
- how to hash the features. Available options: strings, all
- --hash_seed arg (=0)
- seed for hash function
- --ignore arg
- ignore namespaces beginning with character <arg>
- --ignore_linear arg
- ignore namespaces beginning with character <arg> for linear terms only
- --keep arg
- keep namespaces beginning with character <arg>
- --redefine arg
- redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form 'N:=S' where := is operator. Empty N or S are treated as default namespace. Use ':' as a wildcard in S.
- -b [ --bit_precision ] arg
- number of bits in the feature table
- --noconstant
- Don't add a constant feature
- -C [ --constant ] arg
- Set initial value of constant
- --ngram arg
- Generate N grams. To generate N grams for a single namespace 'foo', arg should be fN.
- --skips arg
- Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace 'foo', arg should be fN.
- --feature_limit arg
- limit to N features. To apply to a single namespace 'foo', arg should be fN
- --affix arg
- generate prefixes/suffixes of features; argument '+2a,-3b,+1' means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace
- --spelling arg
- compute spelling features for a give namespace (use '_' for default namespace)
- --dictionary arg
- read a dictionary for additional features (arg either 'x:file' or just 'file')
- --dictionary_path arg
- look in this directory for dictionaries; defaults to current directory or env{PATH}
- --interactions arg
- Create feature interactions of any level between namespaces.
- --permutations
- Use permutations instead of combinations for feature interactions of same namespace.
- --leave_duplicate_interactions
- Don't remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: '-q ab -q ba' and a lot more in '-q ::'.
- -q [ --quadratic ] arg
- Create and use quadratic features
- --q: arg
- : corresponds to a wildcard for all printable characters
- --cubic arg
- Create and use cubic features
Example options:¶
- -t [ --testonly ]
- Ignore label information and just test
- --holdout_off
- no holdout data in multiple passes
- --holdout_period arg (=10)
- holdout period for test only
- --holdout_after arg
- holdout after n training examples, default off (disables holdout_period)
- --early_terminate arg (=3)
- Specify the number of passes tolerated when holdout loss doesn't decrease before early termination
- --passes arg
- Number of Training Passes
- --initial_pass_length arg
- initial number of examples per pass
- --examples arg
- number of examples to parse
- --min_prediction arg
- Smallest prediction to output
- --max_prediction arg
- Largest prediction to output
- --sort_features
- turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes
- --loss_function arg (=squared)
- Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic, quantile and poisson.
- --quantile_tau arg (=0.5)
- Parameter \tau associated with Quantile loss. Defaults to 0.5
- --l1 arg
- l_1 lambda
- --l2 arg
- l_2 lambda
- --no_bias_regularization arg
- no bias in regularization
- --named_labels arg
- use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg "--named_labels Noun,Verb,Adj,Punc"
Output model:¶
- -f [ --final_regressor ] arg
- Final regressor
- --readable_model arg
- Output human-readable final regressor with numeric features
- --invert_hash arg
- Output human-readable final regressor with feature names. Computationally expensive.
- --save_resume
- save extra state so learning can be resumed later with new data
- --preserve_performance_counters
- reset performance counters when warmstarting
- --save_per_pass
- Save the model after every pass over data
- --output_feature_regularizer_binary arg
- Per feature regularization output file
- --output_feature_regularizer_text arg Per feature regularization output file,
- in text
- --id arg
- User supplied ID embedded into the final regressor
Output options:¶
- -p [ --predictions ] arg
- File to output predictions to
- -r [ --raw_predictions ] arg
- File to output unnormalized predictions to
Audit Regressor:¶
- --audit_regressor arg
- stores feature names and their regressor values. Same dataset must be used for both regressor training and this mode.
Search options:¶
- --search arg
- Use learning to search, argument=maximum action id or 0 for LDF
- --search_task arg
- the search task (use "--search_task list" to get a list of available tasks)
- --search_metatask arg
- the search metatask (use "--search_metatask list" to get a list of available metatasks)
- --search_interpolation arg
- at what level should interpolation happen? [*data|policy]
- --search_rollout arg
- how should rollouts be executed? [policy|oracle|*mix_per_state|mix_p
- er_roll|none]
- --search_rollin arg
- how should past trajectories be generated? [policy|oracle|*mix_per_stat e|mix_per_roll]
- --search_passes_per_policy arg (=1)
- number of passes per policy (only valid for search_interpolation=policy)
- --search_beta arg (=0.5)
- interpolation rate for policies (only valid for search_interpolation=policy)
- --search_alpha arg (=1.00000001e-10)
- annealed beta = 1-(1-alpha)^t (only valid for search_interpolation=data)
- --search_total_nb_policies arg
- if we are going to train the policies through multiple separate calls to vw, we need to specify this parameter and tell vw how many policies are eventually going to be trained
- --search_trained_nb_policies arg
- the number of trained policies in a file
- --search_allowed_transitions arg
- read file of allowed transitions [def: all transitions are allowed]
- --search_subsample_time arg
- instead of training at all timesteps, use a subset. if value in (0,1), train on a random v%. if v>=1, train on precisely v steps per example, if v<=-1, use active learning
- --search_neighbor_features arg
- copy features from neighboring lines. argument looks like: '-1:a,+2' meaning copy previous line namespace a and next next line from namespace _unnamed_, where ',' separates them
- --search_rollout_num_steps arg
- how many calls of "loss" before we stop really predicting on rollouts and switch to oracle (default means "infinite")
- --search_history_length arg (=1)
- some tasks allow you to specify how much history their depend on; specify that here
- --search_no_caching
- turn off the built-in caching ability (makes things slower, but technically more safe)
- --search_xv
- train two separate policies, alternating prediction/learning
- --search_perturb_oracle arg (=0)
- perturb the oracle on rollin with this probability
- --search_linear_ordering
- insist on generating examples in linear order (def: hoopla permutation)
- --search_active_verify arg
- verify that active learning is doing the right thing (arg = multiplier, should be = cost_range * range_c)
- --search_save_every_k_runs arg
- save model every k runs
Experience Replay:¶
- --replay_c arg
- use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size
- --replay_c_count arg (=1)
- how many times (in expectation) should each example be played (default: 1 = permuting)
Explore evaluation:¶
- --explore_eval
- Evaluate explore_eval adf policies
- --multiplier arg
- Multiplier used to make all rejection sample probabilities <= 1
Make Multiclass into Contextual Bandit:¶
- --cbify arg
- Convert multiclass on <k> classes into a contextual bandit problem
- --cbify_cs
- consume cost-sensitive classification examples instead of multiclass
- --loss0 arg (=0)
- loss for correct label
- --loss1 arg (=1)
- loss for incorrect label
Contextual Bandit Exploration with Action Dependent Features:¶
- --cb_explore_adf
- Online explore-exploit for a contextual bandit problem with multiline action dependent features
- --first arg
- tau-first exploration
- --epsilon arg
- epsilon-greedy exploration
- --bag arg
- bagging-based exploration
- --cover arg
- Online cover based exploration
- --psi arg (=1)
- disagreement parameter for cover
- --nounif
- do not explore uniformly on zero-probability actions in cover
- --softmax
- softmax exploration
- --regcb
- RegCB-elim exploration
- --regcbopt
- RegCB optimistic exploration
- --mellowness arg (=0.100000001)
- RegCB mellowness parameter c_0. Default 0.1
- --greedify
- always update first policy once in bagging
- --cb_min_cost arg (=0)
- lower bound on cost
- --cb_max_cost arg (=1)
- upper bound on cost
- --first_only
- Only explore the first action in a tie-breaking event
- --lambda arg (=-1)
- parameter for softmax
Contextual Bandit Exploration:¶
- --cb_explore arg
- Online explore-exploit for a <k> action contextual bandit problem
- --first arg
- tau-first exploration
- --epsilon arg (=0.0500000007)
- epsilon-greedy exploration
- --bag arg
- bagging-based exploration
- --cover arg
- Online cover based exploration
- --psi arg (=1)
- disagreement parameter for cover
Multiworld Testing Options:¶
- --multiworld_test arg
- Evaluate features as a policies
- --learn arg
- Do Contextual Bandit learning on <n> classes.
- --exclude_eval
- Discard mwt policy features before learning
Contextual Bandit with Action Dependent Features:¶
- --cb_adf
- Do Contextual Bandit learning with multiline action dependent features.
- --rank_all
- Return actions sorted by score order
- --no_predict
- Do not do a prediction when training
- --cb_type arg (=ips)
- contextual bandit method to use in {ips,dm,dr, mtr}
Contextual Bandit Options:¶
- --cb arg
- Use contextual bandit learning with <k> costs
- --cb_type arg (=dr)
- contextual bandit method to use in {ips,dm,dr}
- --eval
- Evaluate a policy rather than optimizing.
Cost Sensitive One Against All with Label Dependent Features:¶
- --csoaa_ldf arg
- Use one-against-all multiclass learning with label dependent features.
- --ldf_override arg
- Override singleline or multiline from csoaa_ldf or wap_ldf, eg if stored in file
- --csoaa_rank
- Return actions sorted by score order
- --probabilities
- predict probabilites of all classes
- --wap_ldf arg
- Use weighted all-pairs multiclass learning with label dependent features.
- Specify singleline or multiline.
Interact via elementwise multiplication:¶
- --interact arg
- Put weights on feature products from namespaces <n1> and <n2>
Cost Sensitive One Against All:¶
- --csoaa arg
- One-against-all multiclass with <k> costs
Cost-sensitive Active Learning:¶
- --cs_active arg
- Cost-sensitive active learning with <k> costs
- --simulation
- cost-sensitive active learning simulation mode
- --baseline
- cost-sensitive active learning baseline
- --domination
- cost-sensitive active learning use domination. Default 1
- --mellowness arg (=0.100000001)
- mellowness parameter c_0. Default 0.1.
- --range_c arg (=0.5)
- parameter controlling the threshold for per-label cost uncertainty. Default 0.5.
- --max_labels arg (=18446744073709551615)
- maximum number of label queries.
- --min_labels arg (=18446744073709551615)
- minimum number of label queries.
- --cost_max arg (=1)
- cost upper bound. Default 1.
- --cost_min arg (=0)
- cost lower bound. Default 0.
- --csa_debug
- print debug stuff for cs_active
Multilabel One Against All:¶
- --multilabel_oaa arg
- One-against-all multilabel with <k> labels
importance weight classes:¶
- --classweight arg
- importance weight multiplier for class
Recall Tree:¶
- --recall_tree arg
- Use online tree for multiclass
- --max_candidates arg
- maximum number of labels per leaf in the tree
- --bern_hyper arg (=1)
- recall tree depth penalty
- --max_depth arg
- maximum depth of the tree, default log_2 (#classes)
- --node_only arg (=0)
- only use node features, not full path features
- --randomized_routing arg (=0)
- randomized routing
Logarithmic Time Multiclass Tree:¶
- --log_multi arg
- Use online tree for multiclass
- --no_progress
- disable progressive validation
- --swap_resistance arg (=4)
- higher = more resistance to swap, default=4
Error Correcting Tournament Options:¶
- --ect arg
- Error correcting tournament with <k> labels
- --error arg (=0)
- errors allowed by ECT
Boosting:¶
- --boosting arg
- Online boosting with <N> weak learners
- --gamma arg (=0.100000001)
- weak learner's edge (=0.1), used only by online BBM
- --alg arg (=BBM)
- specify the boosting algorithm: BBM (default), logistic (AdaBoost.OL.W), adaptive (AdaBoost.OL)
One Against All Options:¶
- --oaa arg
- One-against-all multiclass with <k> labels
- --oaa_subsample arg
- subsample this number of negative examples when learning
- --probabilities
- predict probabilites of all classes
- --scores
- output raw scores per class
Top K:¶
- --top arg
- top k recommendation
Experience Replay:¶
- --replay_m arg
- use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size
- --replay_m_count arg (=1)
- how many times (in expectation) should each example be played (default: 1 = permuting)
Binary loss:¶
- --binary
- report loss as binary classification on -1,1
Bootstrap:¶
- --bootstrap arg
- k-way bootstrap by online importance resampling
- --bs_type arg
- prediction type {mean,vote}
scorer options:¶
- --link arg (=identity)
- Specify the link function: identity, logistic, glf1 or poisson
Stagewise polynomial options:¶
- --stage_poly
- use stagewise polynomial feature learning
- --sched_exponent arg (=1)
- exponent controlling quantity of included features
- --batch_sz arg (=1000)
- multiplier on batch size before including more features
- --batch_sz_no_doubling
- batch_sz does not double
Low Rank Quadratics FA:¶
- --lrqfa arg
- use low rank quadratic features with field aware weights
Low Rank Quadratics:¶
- --lrq arg
- use low rank quadratic features
- --lrqdropout
- use dropout training for low rank quadratic features
Autolink:¶
- --autolink arg
- create link function with polynomial d
Marginal:¶
- --marginal arg
- substitute marginal label estimates for ids
- --initial_denominator arg (=1)
- initial denominator
- --initial_numerator arg (=0.5)
- initial numerator
- --compete
- enable competition with marginal features
- --update_before_learn arg (=0)
- update marginal values before learning
- --unweighted_marginals arg (=0)
- ignore importance weights when computing marginals
- --decay arg (=0)
- decay multiplier per event (1e-3 for example)
Matrix Factorization Reduction:¶
- --new_mf arg
- rank for reduction-based matrix factorization
Neural Network:¶
- --nn arg
- Sigmoidal feedforward network with <k> hidden units
- --inpass
- Train or test sigmoidal feedforward network with input passthrough.
- --multitask
- Share hidden layer across all reduced tasks.
- --dropout
- Train or test sigmoidal feedforward network using dropout.
- --meanfield
- Train or test sigmoidal feedforward network using mean field.
Confidence:¶
- --confidence
- Get confidence for binary predictions
- --confidence_after_training
- Confidence after training
Active Learning with Cover:¶
- --active_cover
- enable active learning with cover
- --mellowness arg (=8)
- active learning mellowness parameter c_0. Default 8.
- --alpha arg (=1)
- active learning variance upper bound parameter alpha. Default 1.
- --beta_scale arg (=3.1622777)
- active learning variance upper bound parameter beta_scale. Default sqrt(10).
- --cover arg (=12)
- cover size. Default 12.
- --oracular
- Use Oracular-CAL style query or not. Default false.
Active Learning:¶
- --active
- enable active learning
- --simulation
- active learning simulation mode
- --mellowness arg (=8)
- active learning mellowness parameter c_0. Default 8
Experience Replay:¶
- --replay_b arg
- use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size
- --replay_b_count arg (=1)
- how many times (in expectation) should each example be played (default: 1 = permuting)
Baseline options:¶
- --baseline
- Learn an additive baseline (from constant features) and a residual separately in regression.
- --lr_multiplier arg
- learning rate multiplier for baseline model
- --global_only
- use separate example with only global constant for baseline predictions
- --check_enabled
- only use baseline when the example contains enabled flag
OjaNewton options:¶
- --OjaNewton
- Online Newton with Oja's Sketch
- --sketch_size arg (=10)
- size of sketch
- --epoch_size arg (=1)
- size of epoch
- --alpha arg (=1)
- mutiplicative constant for indentiy
- --alpha_inverse arg
- one over alpha, similar to learning rate
- --learning_rate_cnt arg (=2)
- constant for the learning rate 1/t
- --normalize arg (=1)
- normalize the features or not
- --random_init arg (=1)
- randomize initialization of Oja or not
LBFGS and Conjugate Gradient options:¶
- --conjugate_gradient
- use conjugate gradient based optimization
- --bfgs
- use bfgs optimization
- --hessian_on
- use second derivative in line search
- --mem arg (=15)
- memory in bfgs
- --termination arg (=0.00100000005)
- Termination threshold
Latent Dirichlet Allocation:¶
- --lda arg
- Run lda with <int> topics
- --lda_alpha arg (=0.100000001)
- Prior on sparsity of per-document topic weights
- --lda_rho arg (=0.100000001)
- Prior on sparsity of topic distributions
- --lda_D arg (=10000)
- Number of documents
- --lda_epsilon arg (=0.00100000005)
- Loop convergence threshold
- --minibatch arg (=1)
- Minibatch size, for LDA
- --math-mode arg (=0)
- Math mode: simd, accuracy, fast-approx
- --metrics arg (=0)
- Compute metrics
Noop Learner:¶
- --noop
- do no learning
Print psuedolearner:¶
- print examples
Gradient Descent Matrix Factorization:¶
- --rank arg
- rank for matrix factorization.
Network sending:¶
- --sendto arg
- send examples to <host>
Stochastic Variance Reduced Gradient:¶
- --svrg
- Streaming Stochastic Variance Reduced Gradient
- --stage_size arg (=1)
- Number of passes per SVRG stage
Follow the Regularized Leader:¶
- --ftrl
- FTRL: Follow the Proximal Regularized Leader
- --ftrl_alpha arg (=0.00499999989)
- Learning rate for FTRL optimization
- --ftrl_beta arg (=0.100000001)
- FTRL beta parameter
- --pistol
- FTRL: Parameter-free Stochastic Learning
- --ftrl_alpha arg (=1)
- Learning rate for FTRL optimization
- --ftrl_beta arg (=0.5)
- FTRL beta parameter
Kernel SVM:¶
- --ksvm
- kernel svm
- --reprocess arg (=1)
- number of reprocess steps for LASVM
- --pool_greedy
- use greedy selection on mini pools
- --para_active
- do parallel active learning
- --pool_size arg (=1)
- size of pools for active learning
- --subsample arg (=1)
- number of items to subsample from the pool
- --kernel arg (=linear)
- type of kernel (rbf or linear (default))
- --bandwidth arg (=1)
- bandwidth of rbf kernel
- --degree arg (=2)
- degree of poly kernel
- --lambda arg
- saving regularization for test time
Gradient Descent options:¶
- --sgd
- use regular stochastic gradient descent update.
- --adaptive
- use adaptive, individual learning rates.
- --adax
- use adaptive learning rates with x^2 instead of g^2x^2
- --invariant
- use safe/importance aware updates.
- --normalized
- use per feature normalized updates
- --sparse_l2 arg (=0)
- use per feature normalized updates
- --l1_state arg (=0)
- use per feature normalized updates
- --l2_state arg (=1)
- use per feature normalized updates
Input options:¶
- -d [ --data ] arg
- Example Set
- --daemon
- persistent daemon mode on port 26542
- --foreground
- in persistent daemon mode, do not run in the background
- --port arg
- port to listen on; use 0 to pick unused port
- --num_children arg
- number of children for persistent daemon mode
- --pid_file arg
- Write pid file in persistent daemon mode
- --port_file arg
- Write port used in persistent daemon mode
- -c [ --cache ]
- Use a cache. The default is <data>.cache
- --cache_file arg
- The location(s) of cache_file.
- --json
- Enable JSON parsing.
- --dsjson
- Enable Decision Service JSON parsing.
- -k [ --kill_cache ]
- do not reuse existing cache: create a new one always
- --compressed
- use gzip format whenever possible. If a cache file is being created, this option creates a compressed cache file. A mixture of raw-text & compressed inputs are supported with autodetection.
- --no_stdin
- do not default to reading from stdin
December 2020 | vw 8.6.1 |