Sunday, February 8, 2009

FOSDEM 2009, Day 2

Notes for FOSDEM Day 2

Sunday, February 8
------------------

Solar Control with 1-wire open hardware
---------------------------------------
Speaker: Wookey
- Install solar thermal panels on the roof of the house to replace hot
water heater
- Solar panels get hot and do heat exchange, not electricity generation
- Roof panels cost around 1000 euros
- Use standard hot water tank
- Use thermosiphon
- Hot water is lighter than the cold water in the tank, so it comes up
a tube (without a pipe), and replaces hot water in the tank
- Can manage the system with a commercial controller, but costs 388 euros
- Instead, use a home-brew solution instead
- 500 MHz processor, 384MB memory
- I/O expansion bus with I2C, etc
- 1-wire sensors to do temperature detection
- Good to about 100 meters
- 1-wire bus is actually 3 wires - 1 data, 1 ground, 1 power
- 14 kb/s
- Multiplex 8 1-wire buses
- scan 13 devices/second
- measurement 94-750ms
- Hardware
- 5 wires - I2C data, I2CCiK, GND, +5V, IO0
- Sensors wrapped around pipes
- Sensors around tank
- Software
- Debian
- modprobe i2c-pxa, i2c-dev
- /dev/i2c-0, /dev/i2c-1
- 0 is general, 1 is for power
- I2C addresses are fixed when the board is soldered
- 8-bit addresses, but bottom bit specifies read/write (so 7 address
lines)
- i2cdump from lm-sensors or i2c-tools
- modprobe pcf8574a
- OWFS
- One Wire FS
- FUSE filesystem
- OWFS daemon
- Perl binding, tcl binding, shell
- Does temperature
- # owdir -s 4304
- # owread -s 4304 /28/temperature
- Logging data
- First cut - manual rrdtool
- rrdtool create
- rrdupdate to fill in the data
- rrdtool graph
- OK, but pretty manual. Also does all processing on embedded
board, which is very slow (floating point on ARM)
- Munin
- Hides rrdtool details
- Remote graphing; just fetches data from embedded board, does
processing offline
- Plugin system
- Munin ends up working well, nice graphs
- Control
- Interesting optimization problem; when to turn pumps on and off.
Better to run the pump continuously, or turn on and off rapidly?
- Current algorithm is simple; turn on pumps when tank is hotter than
heat exchange panel
- Reliability
- Original system uptime 87 days
- owserver crashed once
- Survived disk full condition
- Some 1-wire problems
- Reading data not always reliable
- When the sensor has problems, goes to 85C and stays there
- Unfortunate because 85C is actually a valid reading
- Nominal sensor accuracy is 0.5 C
- In practice 2C difference
- Can reach 0.5C in ideal conditions, though
- Lesson: need to clamp sensors very well to what you are measuring
- Bigger Project
- 2 tanks, 2 overflow
- RJ45 sensors
- Open hardware design for digitemp sensors
- Future?
- Reliability during power failure situations
- Local User Interface (LED, LCD)
- More information about solar tank temp, bath status
- House sensors to control heating, cooling
- Inputs - buttons to say "bath mode", "leaving house"
- Other Related Software
- DIYzoning
- Misterhouse
- temploggerd
- WT app

FreeIPA
-------
Speaker: Simo Sorce
- Free Integrity, Policy, Audit
- Goal is to make it simpler to manage all of the above
- Using standard protocols
- Target - system administrators
- Identity management is the realm of proprietary vendors
- Not fully free
- Need an open source solution for security and freedom
- Identity management problem
- Need single source for identity
- Single sign-on/single password
- Single data store for auditing and reporting
- Single point of management
- Implementation problems
- Synchronization and integration
- Windows and Unix
- Distribution of data and credentials
- Distributing change
- Single point of failure
- Integrating interfaces
- FreeIPA components
- Directory is LDAP
- Storage mechanism to perform fine-grained access control
- Organize identity and allow group relationships
- Distribute information across clients
- Replicate information on multiple servers
- Avoid single point of failure
- Chose LDAP since it is standard, extensible, flexible
- Authentication is Kerberos
- Single Sign On with delegation - carry on your credentials from
machine to machine, service to service
- Tested standard, validated secure
- Somewhat extensible - can add other authentication devices
(smartcards), and new encryption algorithms
- Once you introduce kerberos, then you need DNS and NTP
- Client and server must be within 5 minutes of each other to prevent
spoofed/expired tickets
- Host tickets based on DNS name, so client must have one
- Implementation
- Fedora Directory Server
- MIT Kerberos
- Apache + mod_nss, mod_auth_krb, mod_proxy
- Python and Turbogears
- Custom FDS plugins and CLI tools
- Clients can use nss_ldap, pam_krb5
- Avoiding synchronization problems?
- Kerberos information is in LDAP as well as KDC
- Directory structure
- Very flat structure
- Splits accounts from kerberos information in the directory
- Allows users to change some of their own information (based on ACLs)
- Management interfaces all revolve around the directory
- Web UI goes through mod_nss, mod_auth_krb, mod_proxy, GUI, xmlrpc,
then finally to LDAP
- CLI goes through mod_nss, mod_auth_krb, xmlrpc, and then LDAP
- But, installation was still complex
- ipa-server-install makes it much simpler
- Multiple servers
- Replication on directory server
- Because kerberos data in the directory, don't need to replicate
kerberos (kprop)
- 2 simple commands
- ipa-replica-prepare
- ipa-replica-install
- Future
- Add Audit Server - can use AMQP
- Add Certificate Authority
- Add Policy
- Makes interaction much more complex
- Luckily, most of the complexity is hidden from clients (and
administrators) in the FreeIPA core
- However, still a bunch of complexity, so new client agent
- System Security Services Daemon (SSSD) + IPA plugin
- caching, offline operations, etc.
- Host Based Access Control in LDAP
- Roles in LDAP
- New UI with plugin system
- DNS integration
- Dynamic updates
- Integration with Certificate Authorities

Syslinux
--------
Speaker: H. Peter Anvin
- Syslinux is a suite of bootloaders
- SYSLINUX - FAT
- PXELINUX - network PXE booting
- ISOLINUX - CD-ROMs - ISO9660
- EXTLINUX - ext2/ext3 - ext4 coming
- Only for x86/BIOS platform
- Core is in assembly
- Originally written for floppies, needed to be small
- Work underway to fix this
- Sophisticated menu systems
- Extensible via module API
- MEMDISK - disk emulator
- Emulates disk in memory
- Allows booting of legacy INT 13h OSes, mainly DOS
- Used for diagnostics
- gPXELINUX
- Collaborate with etherboot
- Enhanced capabilities for network booting
- http, ftp, nfs, AoE, iSCSI (in addition to TFTP)
- isohybrid
- Allows ISOLINUX .iso to boot from USB stick
- x86 ancient
- Released in 1981; IBM AT 1986; PS/2 1987
- Most BIOS interfaces date from this era
- Boot from floppy and hard disk only - 510 bytes
- 1993 El Torito booting (CD)
- 2 modes
- Disk image on disk, boot disk image
- Native mode, access whole CD - not widely supported until late 90's
- 1997 PXE
- Original PXE were problematic
- Nowadays, works pretty well/standard
- USB drives
- Lots of bugs still
- Tricks can help
- Syslinux history
- 1994 original syslinux implementation
- Designed for boot floppies
- Small to fit, so assembly
- Take DOS OS, make Linux boot floppy
- Add online help support
- 1999 PXELINUX support
- PXE only allows 32K for Network Boot Program (NBE)
- Re-use SYSLINUX
- 2001 ISOLINUX
- 2004 EXTLINUX - ext2/ext3 general purpose loader
- 2006 graphical menu system
- 2008 gPXELINUX, ISOLINUX "hybrid" support (iso that also works on
USB key)
- What's good?
- Designed for dynamic systems
- System discovery at boot time
- Keep to PC established principles
- Sophisticated user interface
- Problems
- Large core of assembly
- x86/BIOS only (because of assembly)
- Dynamic discovery comes at a price
- Can't read a kernel from another disk
- gPXELINUX
- gPXE plus PXELINUX in one image
- gPXE contains an extended PXE interface for PXELINUX
- Can http, ftp, nfs, AoE, iSCSI, SFP (in addition to TFTP)
- Drop-in replacement for pxelinux.0
- Needs TFTP server for initial bootstrap
- But can be skipped if NIC ROM can be reflashed
- Syslinux module API (COM32)
- Small C library (klibc)
- Similar to normal userspace C code
- Main limitation is only sequential, readonly file access
- Common modules
- UI
- Complex menu system - does everything
- Simple menu system - what most people use
- Graphics library makes the same code work for graphics, text, or
serial
- File format modules
- Loading new types of loadable binary objects
- Module describes where various bits go in memory, then syslinux
"shuffle" library puts everything in place
- Shuffle library computes a set of move operations to put things in
the right place in memory
- Example: Microsoft SDI format
- Boot WinPE with syslinux - Windows kernel + ramdisk
- 199 lines total; 139 non-blank/comment lines
- Policy modules
- Example policy:
- Boot kernel X on 64-bit machine
- Boot kernel Y on 32-bit with PAE
- Boot kernel Z otherwise
- Module is 129 lines long; 70 non-blank/comment lines
- Diagnostic modules
- Modules available to show hardware/BIOS state
- e.g. pcidump
- Move dumping data into libraries, and make available to modules
- Already code in Syslinux
- Probe PCI bus
- Map devices to modules
- Build an initramfs with the modules you need
- But...boot devices on USB, firewire? Harder to discover, because you
need a driver for the bus itself
- Ongoing work
- Lua interpreter to write policies
- readdir support - right now, no directories
- Move filesystem code out of assembly - needed to support advanced
filesystems like btrfs
- Switch core from assembly to make it portable; required for EFI
- Core components
- First stage loader, disk/network I/O, BIOS extender (protected mode),
shuffle system (rearranges RAM) need to remain in assembly
- Everything else (command-line, config parser, kernel loader/parser,
filesystem drivers) should be written in C
- Syslinux needs help
- Too large for one person side project
- Significant number of regular contributors
- Documentation help
- http://syslinux.zytor.com

Ext4
----
Speaker: Theodore T'so
- What's good about ext3?
- Most widely use linux filesystem
- People trust it
- Diverse developer community
- important because distros need to understand it well enough to
be comfortable supporting it
- What's not good about ext3
- 16TB filesystem size limitation (32-bit block numbers)
- 32000 limit on subdirectories
- Second resolution timestamps
- Performance limitation
- Traditionally cared more about data integrity than performance
- Is ext4 a new filesystem?
- ext in ext2/3/4 stands for "extended"
- Collection of new features that you can individually enable
- ext4 driver supports all features
- But can have ext3 fs and mount as ext4, which works just fine
- 2.6.29 can mount an ext2 fs with ext4 driver (code from Google)
- Google doesn't care about journals, so can mount ext4 without journal
- ext4 fork was to make sure that ext3 remained stable
- Started with ext3, added new code
- e2fsprogs supports ext2, ext3, ext4
- New features
- Extents instead of indirect blocks - most important
- Delayed allocation
- Multiblock allocation
- Persistent allocation
- Subsecond timestamps
- Greater than 32000 subdirs
- NFSv4 version id's for caching (reliable caching)
- Store file sizes in FS block size, rather than 512 sectors
- Allows huge files; 16TB files on 4k block filesystems
- ATA TRIM support - when deleting a file, can tell block device to use
for something else
- Journal and group descriptor checksums - reliably put which part of
inode table is in use (speeds up fsck)
- Ext2/3 indirect block map
- In the inode, room for 15 block pointers
- First 12 map direct blocks
- File less than 12 blocks long (48k on 4k fs), location of blocks
stored in first inode
- Bigger files allocate indirect block
- Slot 12 holds address of indirect block
- 256 blocks
- Slot 13 is double indirect block
- 256 indirect blocks
- 256 direct blocks
- slot 14 is triple indirect block
- 256 double indirect
- 256 indirect block
- 256 direct blocks
- Inefficient for large files
- Long time to delete because has to read all indirect blocks and
free block pointers in all blocks
- Ext4
- Extents
- Extents are efficient way to represent large file
- Extent is a single descriptor for a range of contiguous blocks
- Logical 0 block, length 1000 blocks, physical 200
- 12 bytes ext4_extent structure (from ClusterFS)
- Address 1EB fs (48 bit physical block number)
- Address 16TB file size (32-bit logical block number)
- Max extent 128MB (16 bit extent length)
- Up to 3 extents stored in inode directly
- Vast majority of files (99%) live in the inode
- For greater than 3 extents, convert to B-tree
- Block allocator changes
- Extents works best if files are contiguous
- Delayed allocation and allocating multiple blocks at a time makes this
much more likely
- Block allocator looks at disk to try to find free space to fit number
of blocks we want to allocate
- Makes fs more resistant to fragmentation
- Responsible for most of ext4's performance improvements
- Problem: ext3 journal mode semantics means that data is written
before inode
- Avoids security concerns
- Application programmers came to depend on this
- ext4 does this differently, because of delayed allocation, we don't
push out to disk until page allocator (30 seconds)
- And staged, so it may take a while
- In laptop mode, 2-5 minutes to write to disk
- POSIX allows this
- Open question: fsync? fdatasync?
- Persistent pre-allocation
- Useful for databases and video
- Pre-allocate 1GB on disk ahead of time, contiguousally
- Useful for package updates (rpm, deb), reduce fragmentation
- Useful for file grown by append (like logfiles); can pre-allocate
space, and then the logfile contiguous
- Available via posix_fallocate, but...
- On older FS's, will just write lots of blocks of 0 (very slow)
- Changes i_size field, meaning that it's size (as reported by stat)
looks much bigger than it actually is currently using
- Need glibc direct access to Linux system call
- Avoid i_size change
- Fail on old FS's
- e2fsck Performance
- Not explicitly engineered
- But huge improvements
- Fewer extent tree blocks to read instead of indirect blocks
- Uninitialized block groups means don't have to read portions of the
inode table

No comments:

Post a Comment