Data Formats: ITEC 1011 Introduction To Information Technologies
Data Formats: ITEC 1011 Introduction To Information Technologies
Data Formats
Introduction
Examples
Real World Data Input device Computer Data
Dear Mom:
10110010
10110010
pp. 59.-61
ITEC 1011
ITEC 1011
Rules/Conventions
Proprietary formats
Unique to a product or company E.g., Microsoft Word, Corel Word Perfect, IBM Lotus Notes
Standards
Evolve two ways:
Proprietary formats become de facto standards (e.g., Adobe PostScript, Apple Quick Time) Committee is struck to solve a problem (Motion Pictures Experts Group, MPEG)
pp. 61-62 ITEC 1011
Standards Organizations
ISO International Standards Organization CSA Canadian Standards Association ANSI American National Standards Institute IEEE Institute for Electrical and Electronics Engineers Etc.
ITEC 1011
Examples of Standards
Type of Data Alphanumeric
Image Motion picture
ITEC 1011
Why Standards?
Standard are arbitrary They exist because they are
Convenient Efficient Flexible Appropriate Etc.
ITEC 1011
Alphanumeric Data
Problem: Distinguishing between the number 123 (one hundred and twenty-three) and the characters 123 (one, two, three) Four standards for representing letters (alpha) and numbers
BCD Binary-coded decimal ASCII American standard code for information interchange EBCDIC Extended binary-coded decimal interchange code Unicode
pp. 63-69 ITEC 1011
ITEC 1011
3
4 5 6 7
0011
0100 0101 0110 0111
8
9
1000
1001
Example
709310 = ? (in BCD)
7 0 9 3
0111
0000
1001
0011
ITEC 1011
ITEC 1011
The Problem
Representing text strings, such as Hello, world, in a computer
ITEC 1011
ITEC 1011
ASCII Features
7-bit code 8th bit is unused (or used for a parity bit) 27 = 128 codes Two general types of codes:
95 are Graphic codes (displayable on a console) 33 are Control codes (control features of the console or communications channel)
ITEC 1011
ASCII Chart
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL
ITEC 1011
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
100 @ A B C D E F G H I J K L M N O
101 P Q R S T U V W X Y Z [ \ ] ^ _
110 ` a b c d e f g h i j k l m n o
111 p q r s t u v w x y z { | } ~ DEL
ITEC 1011
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 Least 1100 1101 1110 1111
000 001 010 011 NULL DLE 0 SOH DC1 ! 1 STX DC2 " 2 ETX DC3 # 3 EDT DC4 Most $ significant 4 ENQ NAK % 5 ACK SYN & 6 BEL ETB ' 7 BS CAN ( 8 HT EM ) 9 LF SUB * : VT ESC + ; significant bit FF FS , < CR GS = SO RS . > SI US / ?
100 @ A B C bit D E F G H I J K L M N O
101 P Q R S T U V W X Y Z [ \ ] ^ _
110 ` a b c d e f g h i j k l m n o
111 p q r s t u v w x y z { | } ~ DEL
ITEC 1011
e.g., a = 1100001
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL
ITEC 1011
95 Graphic codes
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL
ITEC 1011
33 Control codes
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL
ITEC 1011
Alphabetic codes
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL
ITEC 1011
Numeric codes
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL
ITEC 1011
Punctuation, etc.
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000 NULL SOH STX ETX EDT ENQ ACK BEL BS HT LF VT FF CR SO SI 001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 010 ! " # $ % & ' ( ) * + , . / 011 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 100 @ A B C D E F G H I J K L M N O 101 P Q R S T U V W X Y Z [ \ ] ^ _ 110 ` a b c d e f g h i j k l m n o 111 p q r s t u v w x y z { | } ~ DEL
ITEC 1011
= = = = = = = = = = = =
Binary 01001000 01100101 01101100 01101100 01101111 00101100 00100000 01110111 01100111 01110010 01101100 01100100
= = = = = = = = = = = =
Hexadecimal 48 65 6C 6C 6F 2C 20 77 67 72 6C 64
= = = = = = = = = = = =
Decimal 72 101 108 108 111 44 32 119 103 114 108 100
Hexadecimal code
ITEC 1011
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
100 @ A B C D E F G H I J K L M N O
101 P Q R S T U V W X Y Z [ \ ] ^ _
110 ` a b c d e f g h i j k l m n o
111 p q r s t u v w x y z { | } ~ DEL
ITEC 1011
Terminology
Learn the names of the special symbols
[] {} () @ & ~ brackets braces parentheses commercial at sign ampersand tilde
ITEC 1011
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
001 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
100 @ A B C D E F G H I J K L M N O
101 P Q R S T U V W X Y Z [ \ ] ^ _
110 ` a b c d e f g h i j k l m n o
111 p q r s t u v w x y z { | } ~ DEL
ITEC 1011
Escape Sequences
Extend the capability of the ASCII code set For controlling terminals and formatting output Defined by ANSI in documents X3.41-1974 and X3.64-1977 The escape code is ESC = 1B16 An escape sequence begins with two codes: ESC 1B16
ITEC 1011
[ 5B16
Examples
Erase display: Erase line: ESC [ 2 J ESC [ K
ITEC 1011
Next 1 slides
ITEC 1011
EBCDIC
Extended BCD Interchange Code (pronounced ebb-se-dick) 8-bit code Developed by IBM Rarely used today IBM mainframes only
ITEC 1011
Next 2 slides
ITEC 1011
Unicode
16-bit standard Developed by a consortia Intended to supercede older 7- and 8-bit codes
ITEC 1011
contains 38,887 distinct coded characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica. http://www.unicode.org
ITEC 1011
Keyboard Input
Key (scan) codes are converted to ASCII ASCII code sent to host computer Received by the host as a stream of data Stored in buffer Processed Etc.
Shift Key
inhibits bit 5 in the ASCII code
Key(s)
a
Shift
ITEC 1011
Control Key
inhibits bits 5 & 6 in the ASCII code
Key(s)
c
Ctrl
ITEC 1011
Other Input
OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices
OCR
Hello, world
Optical scan 10110110
Page of text
Computer file
ITEC 1011
Other Input
OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices
Bar Codes
An automatic identification (Auto ID) technology that streamlines identification and data collection See
http://www.digital.net/barcoder/barcode.html
ITEC 1011
Other Input
OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices
Voice/audio Input
Input device: microphone Audio input is digitized and stored Processed in two ways
As is (no recognition) Recognized and converted to alphanumeric data (ASCII)
Digitize
10110010
ITEC 1011
Other Input
OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices
Punched Cards
Invented by Herman Hollerith (founder of IBM) Each card holds 80 characters
ITEC 1011
Other Input
OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices
Images
Typically images are pictures that are optically scanned and saved as a bit map or in some other format Many formats
gif, jpeg,
ITEC 1011
ITEC 1011
Objects
Images made of geometrically definable shapes Offer efficiency, flexibility, small size, etc.
ITEC 1011
Other Input
OCR optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices
Pointing Devices
Originally used for specifying coordinates (x, y) for graphical input Today used as general purpose device for graphical user interfaces (GUIs)
ITEC 1011
Thank you
ITEC 1011