Skip to content

Sun Grid Engine

$ qconf -sc

Implementations

Tutorials

Management

$ qstat -g c

Queue state/status

View queue state/status: - a – Load threshold alarm - o – Orphaned - A – Suspend threshold alarm - C – Suspended by calendar - D – Disabled by calendar - c – Configuration ambiguous - d – Disabled - s – Suspended - u – Unknown - E – Error

$ qstat -f -explain acAE

Reset queue errors:

sudo qmod -c '*'

Host

$ qhost -j -u jobsubmitter

Job state/status

View job state/status: - d(eletion) - E(rror) - h(old) - r(unning) - R(estarted) - s(uspended) - S(uspended) - t(ransfering) - T(hreshold) - w(aiting)

$ qstat -u '*'
$ qstat -j 4897730
$ qconf -shgrpl

Sun Grid Engine

Admin commands

Prefix these commands with EDITOR=vim to use vim instead of emacs to edit configurations (bonus points if you can change the default to vim).

Command Explanation
qmod -c '*' Reset all queue errors.
qconf -suserl Lists all users.
qconf -auser {username} Add new user.
qconf -au {username} arusers Add user to arusers access control list.

Configurations for a new user should look like the following:

name {username}
oticket 0
fshare 100
delete_time 0
default_project NONE

User commands

Command Explanation
qstat -f Show full queue status
qstat -u '*' List error-ed jobs

Queue state/status: : - a – Load threshold alarm - o – Orphaned - A – Suspend threshold alarm - C – Suspended by calendar - D – Disabled by calendar - c – Configuration ambiguous - d – Disabled - s – Suspended - u – Unknown - E – Error

Job state/status: : - d(eletion) - E(rror) - h(old) - r(unning) - R(estarted) - s(uspended) - S(uspended) - t(ransfering) - T(hreshold) - w(aiting)

Examples

Remove a node

# first, you need to disable the host from queue to avoid any jobs to be allocated to this host
$ qmod -d all.q@thishost.com
# wait for jobs to be finished execution on this host, then kill the execution script
$ qconf -ke thishost.com
# remove it from the cluster, this opens an editor, just remove the lines referring to this host
$ qconf -mq all.q
# remove it from allhosts group, this also opens an editor, remove lines referring to this host
$ qconf -mhgrp @allhosts
# remove it from execution host list
$ qconf -de thishost
# I normally go to the host and delete the sge scripts as well

debugging a job

$ qstat –explain c –j <List of job-IDs>  

qconf

# show me all host in a group
$qconf -shgrp @gropname

Adding a execution host

# Make the new host an administrative host

$ qconf -ah <hostname>

# As root on this new host, run the following script from $SGE_ROOT

$ install_execd
$ qconf --help
usage: qconf [options]
   [-aattr obj_nm attr_nm val obj_id_list]  add to a list attribute of an object
   [-Aattr obj_nm fname obj_id_list]        add to a list attribute of an object
   [-acal calendar_name]                    add a new calendar
   [-Acal fname]                            add a new calendar from file
   [-ackpt ckpt_name]                       add a ckpt interface definition
   [-Ackpt fname]                           add a ckpt interface definition from file
   [-aconf host_list]                       add configurations
   [-Aconf file_list]                       add configurations from file_list
   [-ae [exec_server_template]]             add an exec host using a template
   [-Ae fname]                              add an exec host from file
   [-ah hostname_list]                      add an administrative host
   [-ahgrp group]                           add new host group entry
   [-Ahgrp file]                            add new host group entry from file
   [-arqs [rqs_list]]                       add resource quota set(s)
   [-Arqs fname]                            add resource quota set(s) from file
   [-am user_list]                          add user to manager list
   [-ao user_list]                          add user to operator list
   [-ap pe-name]                            add a new parallel environment
   [-Ap fname]                              add a new parallel environment from file
   [-aprj]                                  add project
   [-Aprj fname]                            add project from file
   [-aq [queue_name]]                       add a new cluster queue
   [-Aq fname]                              add a queue from file
   [-as hostname_list]                      add a submit host
   [-astnode node_shares_list]              add sharetree node(s)
   [-astree]                                create/modify the sharetree
   [-Astree fname]                          create/modify the sharetree from file
   [-at thread_name]                        add/start qmaster thread
   [-au user_list listname_list]            add user(s) to userset list(s)
   [-Au fname]                              add userset from file
   [-auser]                                 add user
   [-Auser fname]                           add user from file
   [-clearusage]                            clear all user/project sharetree usage
   [-cq destin_id_list]                     clean queue
   [-dattr obj_nm attr_nm val obj_id_list]  delete from a list attribute of an object
   [-Dattr obj_nm fname obj_id_list]        delete from a list attribute of an object
   [-dcal calendar_name]                    delete calendar
   [-dckpt ckpt_name]                       delete ckpt interface definition
   [-dconf host_list]                       delete local configurations
   [-de host_list]                          delete exec host
   [-dh host_list]                          delete administrative host
   [-dhgrp group]                           delete host group entry
   [-drqs rqs_list]                         delete resource quota set(s)
   [-dm user_list]                          delete user from manager list
   [-do user_list]                          delete user from operator list
   [-dp pe-name]                            delete parallel environment
   [-dprj project_list]                     delete project
   [-dq destin_id_list]                     delete queue
   [-ds host_list]                          delete submit host
   [-dstnode node_list]                     delete sharetree node(s)
   [-dstree]                                delete the sharetree
   [-du user_list listname_list]            delete user(s) from userset list(s)
   [-dul listname_list]                     delete userset list(s) completely
   [-duser user_list]                       delete user(s)
   [-help]                                  print this help
   [-ke[j] host_list                        shutdown execution daemon(s)
   [-k{m|s}]                                shutdown master|scheduling daemon
   [-kec evid_list]                         kill event client
   [-kt thread_name]                        kill qmaster thread
   [-mattr obj_nm attr_nm val obj_id_list]  modify an attribute (or element in a sublist) of an object
   [-Mattr obj_nm fname obj_id_list]        modify an attribute (or element in a sublist) of an object
   [-mc ]                                   modify complex attributes
   [-mckpt ckpt_name]                       modify a ckpt interface definition
   [-Mc fname]                              modify complex attributes from file
   [-mcal calendar_name]                    modify calendar
   [-Mcal fname]                            modify calendar from file
   [-Mckpt fname]                           modify a ckpt interface definition from file
   [-mconf [host_list|global]]              modify configurations
   [-Mconf file_list]                       modify configurations from file_list
   [-me server]                             modify exec server
   [-Me fname]                              modify exec server from file
   [-mhgrp group]                           modify host group entry
   [-Mhgrp file]                            modify host group entry from file
   [-mrqs [rqs_list]]                       modify resource quota set(s)
   [-Mrqs fname [rqs_list]]                 modify resource quota set(s) from file
   [-mp pe-name]                            modify a parallel environment
   [-Mp fname]                              modify a parallel environment from file
   [-mprj project]                          modify a project
   [-Mprj fname]                            modify project from file
   [-mq queue]                              modify a queue
   [-Mq fname]                              modify a queue from file
   [-msconf]                                modify scheduler configuration
   [-Msconf fname]                          modify scheduler configuration from file
   [-mstnode node_shares_list]              modify sharetree node(s)
   [-Mstree fname]                          modify/create the sharetree from file
   [-mstree]                                modify/create the sharetree
   [-mu listname_list]                      modify the given userset list
   [-Mu fname]                              modify userset from file
   [-muser user]                            modify a user
   [-Muser fname]                           modify a user from file
   [-purge obj_nm3 attr_nm objectname]      deletes attribute from object_instance
   [-rattr obj_nm attr_nm val obj_id_list]  replace a list attribute of an object
   [-Rattr obj_nm fname obj_id_list]        replace a list attribute of an object
   [-sc]                                    show complex attributes
   [-scal calendar_name]                    show given calendar
   [-scall]                                 show a list of all calendar names
   [-sckpt ckpt_name]                       show ckpt interface definition
   [-sckptl]                                show all ckpt interface definitions
   [-sconf [host_list|global]]              show configurations
   [-sconfl]                                show a list of all local configurations
   [-se server]                             show given exec server
   [-secl]                                  show event client list
   [-sel]                                   show a list of all exec servers
   [-sep]                                   show a list of all licensed processors
   [-sh]                                    show a list of all administrative hosts
   [-shgrp group]                           show host group
   [-shgrp_tree group]                      show host group and used hostgroups as tree
   [-shgrp_resolved group]                  show host group with resolved hostlist
   [-shgrpl]                                show host group list
   [-sds]                                   show detached settings
   [-srqs [rqs_list]]                       show resource quota set(s)
   [-srqsl]                                 show resource quota set list
   [-sm]                                    show a list of all managers
   [-so]                                    show a list of all operators
   [-sobjl obj_nm2 attr_nm val]             show objects which match the given value
   [-sp pe-name]                            show a parallel environment
   [-spl]                                   show all parallel environments
   [-sprj project]                          show a project
   [-sprjl]                                 show a list of all projects
   [-sq [destin_id_list]]                   show the given queue
   [-sql]                                   show a list of all queues
   [-ss]                                    show a list of all submit hosts
   [-sss]                                   show scheduler state
   [-ssconf]                                show scheduler configuration
   [-sstnode node_list]                     show sharetree node(s)
   [-rsstnode node_list]                    show sharetree node(s) and its children
   [-sst]                                   show a formated sharetree
   [-sstree]                                show the sharetree
   [-su listname_list]                      show the given userset list
   [-suser user_list]                       show user(s)
   [-sul]                                   show a list of all userset lists
   [-suserl]                                show a list of all users
   [-tsm]                                   trigger scheduler monitoring
$ qconf -ssconf
algorithm                         default
schedule_interval                 00:00:05
maxujobs                          170
queue_sort_method                 load
job_load_adjustments              np_load_avg=0.90
load_adjustment_decay_time        00:01:00
load_formula                      np_load_avg
schedd_job_info                   false
flush_submit_sec                  0
flush_finish_sec                  0
params                            none
reprioritize_interval             00:00:30
halftime                          12
usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor               1.000000
weight_user                       0.803000
weight_project                    0.020000
weight_department                 0.020000
weight_job                        0.157000
weight_tickets_functional         10000
weight_tickets_share              10000
share_override_tickets            TRUE
share_functional_shares           TRUE
max_functional_jobs_to_schedule   80
report_pjob_tickets               FALSE
max_pending_tasks_per_job         80
halflife_decay_list               none
policy_hierarchy                  OFS
weight_ticket                     1.000000
weight_waiting_time               0.000000
weight_deadline                   3600000.000000
weight_urgency                    0.100000
weight_priority                   1.300000
max_reservation                   0
default_duration                  INFINITY